Creating an audio concordancer to teach listening comprehension

Posted: February 2, 2014 in ELT methodology
Tags: , ,

Here’s a way to ‘patch up’ listening skills by ‘brute force’ that I’ve been trying out over the past few days. The idea is to create an ‘audio concordancer’ based on video files with transcripts, and use it to drill decoding of top 100 words in English, along with some of their high-frequency combinations. These top 100 account for a staggering >50% of  a typical English text (probably even more in speech), and my research(ish) shows that they are the ones that most consistently fail to be decoded. What is more, many of these high-frequency expressions seem to respond very well to drilling – recognizing just ~20+/30 instances in a row, with immediate feedback, seems to do the trick. This would mean that drilling them is likely to make a difference, and just a few episodes of a series will provide a student with more than enough material for drills. 

I had to come up with this because my current students live in a monolingual environment but need to understand native speakers speaking [with a range of accents] to each other, over skype, so they are begging for an efficient approach that would produce some ‘here and now’ results. As a non-native speaker who’s still struggling with some accents (although not with C2-level listening exams, oddly enough), I make an excellent guinea pig here. I’ve recently watched a few series with which I had this unpleasant feeling that I’m missing a lot (The Thick of It, BrE; Numb3rs, AmE) so I decided to try out some of the ground-breaking ideas of John Field and Richard Cauldwell and other researchers and see whether they’d help me with these particular TV shows and accents.

I found a program that, given a video file and a transcript, produces a collection of audiofiles, one audiofile for each line of the transcript.


Along with the audiofiles, the program creates a .tsv file with information what text corresponds to each audiofile. This .tsv file can be opened in notepad and copied to Excel, and then you can filter it to find just those lines that contain a specific word or expression:


I also wrote a simple script that copies all files listed in a text document to a sub-folder, so now I could listen only to those lines that contained the word/expression. The ‘task’ I set myself was to catch the word/expression in question in the line, without relying on the context. If I couldn’t, replayed the line a few times and then checked the transcript.

(By the way, if you want to try this, here’s how to use the script:

  • put the .mp3s produced by srt2srs into C:\Listening\media
  • put a file named filenames.txt into C:\Listening; copy the list of filenames from your .xslx document
  • create a folder C:\Listening\training and a subfolder called ‘currentsearch’
  • Copy the following text into a text document and change its extension to .bat
    /f “delims=” %%i in (C:\Listening\filelist.txt) do echo D|xcopy “C:\Listening\media\%%i” “C:\Listening\training\currentsearch”
  • double-click the .bat file; wait until the files are copied – you’ll find them in the ‘currentsearch’

I’ve been experimenting for 10 days and here are some surprising facts I’ve learnt and noticed.

First, having analyzed a few lines that I’d failed to catch, I noticed that very often I actually failed to catch not a whole stretch of speech, but just one word/expression which is highly frequent and whose pronunciation turns out to be completely at odds with my expectations. The pronunciation of ‘can’ (which is the top 53rd word in English, occurring in ~2% of all sentences) – an almost inaudible /kn/ – was a bit of a shock. Also, ‘do you’, ‘he’ and ‘him’ were challenging. However, having practiced with just 20 to 30 lines (using the transcript for feedback), I learnt to catch the expression over 90% of times. The same results seemed to be reproduced with my students and friends who I’ve tried this with so far:  after about 20+/30 samples they were already consistently catching the weak words (‘that’, ‘there’s’, ‘can’) that they couldn’t hear at all in the first ~5 lines. The only word with which this hasn’t worked so far has been ‘will’ – this one is really hellishly difficult to catch.

Having practiced listening to just the lines  that contained ‘can’, ‘do you’ and ‘he / him’ (3 separate drills), I tried to watch an episode. There were two things that struck me: first, I ‘homed in’ on all occurrences of ‘can’, and second, there were quite a few instances of ‘lagged’ decoding – it felt like I had a bit more processing capacity/confidence and could use this to decipher a few things I’d missed. Amazingly, I understood almost the entire episode, with the exception of just a few (~ 5) utterances.

Later I examined a line I’d failed to catch from an American series called ‘Numb3rs – also quite challenging for me – and there the breakdown turned out to be caused by the chunk ‘there’s’ (top 100th in my home-grown corpus, occurs in >1% of all sentences). As it turned out, this one is often reduced to /ðz/ – again, something that my 10+ years of post-CPE exposure to authentic speech hadn’t taught me. I listened to more lines with ‘there’s’ and my intuition tells me previously I wouldn’t have caught ‘there’s’ in over one half of them.

I’ve tried this task with my upper-intermediate students and they too mostly failed to hear ‘can’, and also ‘that’ (top 8th, occurring in ~9% of lines), but, as I mentioned above, seemed to respond well to drills.

We haven’t started on the rest of 100 most common English words yet but by now I’m expecting to see  similar results for all words that tend to be pronounced with a glottal stop or which lose a weak vowel (which is, most of them, given that ‘k’/’g’ and ‘p’/’b’ tend to be ‘produced’ silently, with no air let out, just as ‘t’ and ‘d’, and that the weak /ɪ/ tends to disappear at the beginning/end of a chunk, as well as from diphthongs).

Rank Word
1 th-
2 be/are
3 t-
4 [-]f (also, /v/
might be
5 [a]n[d]
6 a
7 [i]n
8 th-[t]
9 [h]a[ve]
10 I /a/
11 [i][t]
12 f[or]
13 n-[t]
14 on
15 w[i][th]
16 [h]e
17 as
18 y[ou]
19 do
20 a[t]
Rank Word
21 th-s
22 b[-][t]
23 [h]is
24 by
25 fr-m
26 the[y]
27 w[e]
28 say
29 [h]er
30 she
31 or
32 [a]n
33 [w]-ll
34 my /ma/
35 one
36 all
37 w[oul][d]
38 th[ere]
39 th[eir]
40 wha[t]
Rank Word
41 so /s-/
42 u[p]
43 ou[t] /at/
44 [i]f
45 [a]bou[t] /ba/
46 wh-
47 get
48 which
49 go
50 me
51 wh-n
52 ma[ke]
53 c-n
54 li[ke]
55 time
56 no
57 j-s[t]
58 [h]im
59 know
60 ta[ke]
Rank Word
61 p[eo]ple
62 [i]nt[o]
63 ye[ar]
64 y[our]
65 g[-][d]
66 s-me
67 c-ld
68 th-m
69 see
70 oth[er]
71 th[a]n
72 th[e]n
73 now /na/
74 l-[k]
75 onl[y]
76 c-me
77 [i]ts
78 over
79 th-n[k]
80 also
Rank Word
81 ba[ck]
82 aft[er]
83 use
84 tw[o]
85 how /ha/
86 our /a/
87 wor[k]
88 firs[t]
89 well
90 way[i]
91 even
92 n[j]ew
93 wan[t]
94 [be]cause
95 [a]ny
96 these
97 gi[ve]
98 day
99 mos[t]
100 əs [us]

One controversial potential implication here is that maybe lower-level students should be provided with practice in decoding weak words in challenging contexts with lots of unknown lexis, so that they’d learn to catch these words as opposed to reconstruct them from the surrounding text. Of course, this can’t work 100% of the time, as quite a few of the weak forms are homophones (‘of’ and ‘have’, for instance), but still, choosing one of the homophonous forms seems to constitute less of a processing burden than missing the word altogether and reconstructing it based solely on the context. However, this could be quite a dispiriting exercise, so I’ll first try it out with more ‘prominent’ words and will work extra hard to explain why we’re doing this and why just catching one word is actually a great achievement here.

  1. olyasergeeva says:


    We’ve been using this approach in a listening course and it has helped me personally with some accents (there seems to be some kind of knock-on effect, because I’ve even had noticeable progress with accents that weren’t present in the audios).

  2. […] film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s