copiesofcopies/youtube-transcription · GitHub
Open government developer Waldo Jaquith had a problem: he wanted transcripts for videos of the Virginia legislature but didn’t have the resources to fund their creation nor time to transcribe sessions himself.
When he talked to Matt Cutts at the Newsfoo unconference last December, Google’s lead for Web spam suggested to Jacquith that he make use of YouTube ability to automagically created machine-generated transcripts of video.
Last week, Jaquith posted a $500 bounty for a speech transcription program, funded by 95 backers for a Kickstarter campaign to liberate Virginia’s legislative video.
That’s when something interesting happened, as Jacquith blogged today: Aaron Williamson, a lawyer for the Software Freedom Law Center, created a Python script to fix the problem.
It took just 27 hours for the $500 speech transcription bounty to be claimed. Aaron Williamson produced youtube-transcription, a Python-based pair of scripts that upload video to YouTube and download the resulting machine-generated transcripts of speech.
Jaquith intends to use the code in the Richmond Sunlight project – and because it’s open source, anyone else can press it into service as a means to generate transcripts of video.
The quality of YouTube’s machine-generated transcriptions are, to be fair, mixed, although they are improving. That said, they’re better than none at all.
Williamson told Jaquith that he’ll donate the Kickstarter cash to charity.
@waldojaquith @digiphile @mattcutts (@copiesofcopies, BTW, has directed the bounty to a pair of charities.)
— Waldo Jaquith (@waldojaquith) March 25, 2013
Notes
likewildbulls liked this
quepol liked this
rtylergray liked this
rtylergray reblogged this from oreillyradar
oreillyradar posted this