Here’s a way to ‘patch up’ listening skills by ‘brute force’ that I’ve been trying out over the past few days. The idea is to create an ‘audio concordancer’ based on video files with transcripts, and use it to drill decoding of top 100 words in English, along with some of their high-frequency combinations. These top 100 account for a staggering >50% of a typical English text (probably even more in speech), and my research(ish) shows that they are the ones that most consistently fail to be decoded. What is more, many of these high-frequency expressions seem to respond very well to drilling – recognizing just ~20+/30 instances in a row, with immediate feedback, seems to do the trick. This would mean that drilling them is likely to make a difference, and just a few episodes of a series will provide a student with more than enough material for drills.
I had to come up with this because my current students live in a monolingual environment but need to understand native speakers speaking [with a range of accents] to each other, over skype, so they are begging for an efficient approach that would produce some ‘here and now’ results. As a non-native speaker who’s still struggling with some accents (although not with C2-level listening exams, oddly enough), I make an excellent guinea pig here. I’ve recently watched a few series with which I had this unpleasant feeling that I’m missing a lot (The Thick of It, BrE; Numb3rs, AmE) so I decided to try out some of the ground-breaking ideas of John Field and Richard Cauldwell and other researchers and see whether they’d help me with these particular TV shows and accents.
I found a program that, given a video file and a transcript, produces a collection of audiofiles, one audiofile for each line of the transcript.
Along with the audiofiles, the program creates a .tsv file with information what text corresponds to each audiofile. This .tsv file can be opened in notepad and copied to Excel, and then you can filter it to find just those lines that contain a specific word or expression:
I also wrote a simple script that copies all files listed in a text document to a sub-folder, so now I could listen only to those lines that contained the word/expression. The ‘task’ I set myself was to catch the word/expression in question in the line, without relying on the context. If I couldn’t, replayed the line a few times and then checked the transcript.
(By the way, if you want to try this, here’s how to use the script:
- put the .mp3s produced by srt2srs into C:\Listening\media
- put a file named filenames.txt into C:\Listening; copy the list of filenames from your .xslx document
- create a folder C:\Listening\training and a subfolder called ‘currentsearch’
- Copy the following text into a text document and change its extension to .bat
/f “delims=” %%i in (C:\Listening\filelist.txt) do echo D|xcopy “C:\Listening\media\%%i” “C:\Listening\training\currentsearch”
- double-click the .bat file; wait until the files are copied – you’ll find them in the ‘currentsearch’
I’ve been experimenting for 10 days and here are some surprising facts I’ve learnt and noticed.
First, having analyzed a few lines that I’d failed to catch, I noticed that very often I actually failed to catch not a whole stretch of speech, but just one word/expression which is highly frequent and whose pronunciation turns out to be completely at odds with my expectations. The pronunciation of ‘can’ (which is the top 53rd word in English, occurring in ~2% of all sentences) – an almost inaudible /kn/ – was a bit of a shock. Also, ‘do you’, ‘he’ and ‘him’ were challenging. However, having practiced with just 20 to 30 lines (using the transcript for feedback), I learnt to catch the expression over 90% of times. The same results seemed to be reproduced with my students and friends who I’ve tried this with so far: after about 20+/30 samples they were already consistently catching the weak words (‘that’, ‘there’s’, ‘can’) that they couldn’t hear at all in the first ~5 lines. The only word with which this hasn’t worked so far has been ‘will’ – this one is really hellishly difficult to catch.
Having practiced listening to just the lines that contained ‘can’, ‘do you’ and ‘he / him’ (3 separate drills), I tried to watch an episode. There were two things that struck me: first, I ‘homed in’ on all occurrences of ‘can’, and second, there were quite a few instances of ‘lagged’ decoding – it felt like I had a bit more processing capacity/confidence and could use this to decipher a few things I’d missed. Amazingly, I understood almost the entire episode, with the exception of just a few (~ 5) utterances.
Later I examined a line I’d failed to catch from an American series called ‘Numb3rs – also quite challenging for me – and there the breakdown turned out to be caused by the chunk ‘there’s’ (top 100th in my home-grown corpus, occurs in >1% of all sentences). As it turned out, this one is often reduced to /ðz/ – again, something that my 10+ years of post-CPE exposure to authentic speech hadn’t taught me. I listened to more lines with ‘there’s’ and my intuition tells me previously I wouldn’t have caught ‘there’s’ in over one half of them.
I’ve tried this task with my upper-intermediate students and they too mostly failed to hear ‘can’, and also ‘that’ (top 8th, occurring in ~9% of lines), but, as I mentioned above, seemed to respond well to drills.
We haven’t started on the rest of 100 most common English words yet but by now I’m expecting to see similar results for all words that tend to be pronounced with a glottal stop or which lose a weak vowel (which is, most of them, given that ‘k’/’g’ and ‘p’/’b’ tend to be ‘produced’ silently, with no air let out, just as ‘t’ and ‘d’, and that the weak /ɪ/ tends to disappear at the beginning/end of a chunk, as well as from diphthongs).
One controversial potential implication here is that maybe lower-level students should be provided with practice in decoding weak words in challenging contexts with lots of unknown lexis, so that they’d learn to catch these words as opposed to reconstruct them from the surrounding text. Of course, this can’t work 100% of the time, as quite a few of the weak forms are homophones (‘of’ and ‘have’, for instance), but still, choosing one of the homophonous forms seems to constitute less of a processing burden than missing the word altogether and reconstructing it based solely on the context. However, this could be quite a dispiriting exercise, so I’ll first try it out with more ‘prominent’ words and will work extra hard to explain why we’re doing this and why just catching one word is actually a great achievement here.