Google's Crowdsourcing the Training Set
In all the Google news today, one piece that hasn't gotten much attention is the YouTube speech recognition tool. There's a hidden bit of genius in Google's implementation of this easy captioning tool for YouTube videos (which, by the way, will rock for educational accessibility purposes). Right now it's only turned on for a few trusted sites, but Google is looking to beef up its speech recognition classifier even more by gathering auto tracks from all its YouTube videos. This is actually a much richer dataset than the Goog-411 could provide (or even Google Voice voicemails, really).
That's not the genius part, though. The real genius is Google's "automatic caption timing", where you type in the textual version of your video and YouTube will automatically chunk it up into caption blocks based on what it can decipher from the video. Adding captions is now a cakewalk, so why wouldn't you do it?
Let me say that again, another way. Google has found a way to motivate people to, of their own free will, send in textual versions of their audio tracks. An accurate textual representation of the audio a machine learning algorithm is going to try to classify as, well, text. They just won themselves a huge, presumably fairly accurate, training set for speech-to-text translation, donated by some not-insignificant subset of YouTube's giant population who will want captions on their videos.
Google Voice voicemails may actually spit out something legible in the near future.
That's not the genius part, though. The real genius is Google's "automatic caption timing", where you type in the textual version of your video and YouTube will automatically chunk it up into caption blocks based on what it can decipher from the video. Adding captions is now a cakewalk, so why wouldn't you do it?
Let me say that again, another way. Google has found a way to motivate people to, of their own free will, send in textual versions of their audio tracks. An accurate textual representation of the audio a machine learning algorithm is going to try to classify as, well, text. They just won themselves a huge, presumably fairly accurate, training set for speech-to-text translation, donated by some not-insignificant subset of YouTube's giant population who will want captions on their videos.
Google Voice voicemails may actually spit out something legible in the near future.
Comments