Youtube Starts to Recognize Speech

Google has announced an interesting new beta (I’ll also make the obligatory mention that Google says Gmail is still in Beta.) that they call Google Audio Indexing. ( The idea behind this interesting little creation is that speech recognition software developed by Google will go through Youtube videos and try to interpret what is being said. It will then create a transcript of the video that you can search using this indexer.

At this point, they are only using it on videos related to the US election. Their FAQ says that they are doing this for both technical reasons (my guess is that their software interprets speeches the most accurately) and since it is one of the more popular areas of the site. They also point out that it is an election year and ask “what information could be more important than what describes the views, actions and platforms of the two presidential candidates?”

I gave this new software a few basic queries to see what it found. The most obvious thing to me was to try “lipstick pig” to see what came up. The search only comes up with 2 results. The first is Barack Obama’s most recent news conference mocking the controversy, and the second is a reference that Congressman Tom Tancredo (R-CO) made in 2007. It is interesting that Obama’s original statement didn’t come up in it’s original context and neither did the video that has circulated of John McCain also using that expression.

You get a lot more interesting results when you use a search like “wall street.” This search term comes up with a variety of interesting results including campaign ads, stump speeches and even a Ralph Nader commentary.

This is a tool that definitely has a lot of potential and a variety of uses. It will be interesting to see if it gets expanded beyond the initial set of video it has indexed. While the search function is pretty good, I would love to be able to see more of the surrounding text for each hit. I would also would like to keep out certain types of contents like ads as you can get of lot of extraneous hits that you did not want. Overall, I think the software does a pretty good job of interpreting speech and has a lot of potential.

