How to fix YouTube auto-generated subs?

Posted: June 22, 2018 in Tools

Ever since I started giving CELTA Technology input sessions about two years ago, I’ve  invariably mentioned to the trainees how to look for videos with high-quality subtitles on YouTube (answer: add ‘, cc’ to the search, e.g. ‘unboxing & comparison, cc‘) and how to check if the video you want to use contains good subs or auto-generated subs (answer: open the transcript by clicking on the sign under the video and check if it says ‘English auto-generated’ – if it doesn’t, it was uploaded by a human being. Sometimes the video has both auto-generated subs and good subs, in which case you’ll be able to switch to the good transcript. After that you can copy it by dragging your mouse over it while holding the left button – the way you’d do that with any text on the internet).

subsHowever, what I didn’t realize was that, even if the video only had auto-generated subs, I’d still be able to use the transcripts with my learners. I’ve always assumed that auto-generated transcripts were so inaccurate that it would take ages to edit them – too much hassle! But then a couple of months ago I was looking for a perfect phone comparison video to explore expressions for comparison with my group. I found a great video that contained lots of very useful language and, although it was quite long (nine minutes), I wanted the transcript so badly that I was prepared to spend hours weeding mistakes out of the auto-generated subs.

Only it didn’t take me an hour – it actually took me about 10 minutes. 

Let me walk through what I did:

First, I copied the auto-generated transcript and pasted it into oTranscribe.com, a nifty free web-service for transcribing audios and videos:

oTranscribe

Then I clicked on the ‘Choose YouTube video’ button and provided the link to my video. After that the service goes into the editing mode, allowing you to control the video using the keybord, which makes editing extremely easy: press F1 to rewind the last few seconds, F2 to skip a few seconds, ESC to pause the video/start it again (and when you start it, it automatically rewinds a couple of seconds, which is extremely handy).

oTranscribe1

When I started correcting the transcript, I was amazed to find that the transcript was actually pretty accurate. The fact that oTranscribe is so easy to use and that the automatically generated subs were quite accurate, I was able to edit the transcripts in one listen, occasionally stopping/rewinding here and there, and the whole process took me only a few minutes longer than the actual video duration. Amazing!

Let me know if you try this!

Best wishes,
Olya

Comments
  1. Marc says:

    I’ve made corpora doing this. Not amazing but usable.

    • olyasergeeva says:

      Hey Marc. Did you use it with YouTube videos or with recorded interviews? I used it to transcribe recorded interviews for my Delta M3 and it took ages, no matter how good the tool was, which is why I didn’t expect editing existing subs to be such a piece of cake.

Leave a comment