The Social Science Research Center at DePaul has a micro-lab where researchers (or their graduate students) can access hardware and software to transcribe audio files. Typically, researchers have used these tools to transcribe interviews and focus groups. The process is relatively simple: researchers bring their audio files on portable media, which are loaded onto a machine in the micro lab. This machine has a software called “Express Scribe” and a pedal. The pedal is used to stop, start, rewind and fast forward the audio within the environment of Express Scribe. Additionally, the speed of the audio playback can be modified. In all, this is a great tool and process for individuals to transcribe audio files. However, it is not without its flaws. The main flaw is that it requires users to be in the physical space during business hours. Also, it requires that someone spend the time actually typing the text of the transcription.
In this post, I review two relatively new transcription tools and demonstrate how they might be used to help researchers transcribe spoken language.
The first, oTranscribe is a web-based transcription tool. With it, you upload an audio file and from within the web page, you control audio playback. Keep in mind that if a researcher were going to do this on their own (without coming to the SSRC to use our machine and pedal), this would require playing the audio in something like iTunes and typing the text in a text editor (like MS Word). Which is likely fine, if you’re working on a machine with two monitors. Even so, stopping and restarting the audio file can be quite cumbersome using this approach- even if you are capable and have figured out how to use hotkeys and shortcuts. Remember that hotkeys usually require that you be in the program to use it. So, you’re typing in MS Word, but in order to get audio to stop you have to get back to iTunes with the mouse and actually press stop (or click in the window with iTunes and use a hotkey to stop the audio file).
oTranscribe allows you to do this all in the same place. Even better, when audio is restarted, it repeats the the last bit of where you left off. This gives you a chance to get your hands in place and allows for a much easier orientation. In the default setup, the key to stop and start the audio is the ESC, but you could change that. Additionally, the audio can be slowed down quite a lot. I have demonstrated what the process is like here.
I recorded myself reading the beginning of a chapter in Howard Becker’s Writing for Social Scientists on an iPhone (using the Voice Memos app). Although it sounds like I might be drunk, I am actually not. I have slowed the audio down enough so that I can keep up typing it.
Overall, not a terribly onerous process. I think it beats having to toggle back and forth between different programs.
I learned about Scribe, a tool that does automatic transcription. According to Poynter, it was developed by some students working on a school project. One of the students had to transcribe 12 interviews, and he didn’t want to do it (who does?). He built a script that uses the Google Speech API to transcribe the speech to text. Based in Europe, the Scribe website asks that a user upload an mp3 and provide an email address. The cost to have the file transcribed is €0.09 cents per minute. As of now, there is a limit to how long the audio file can be (80 minutes). Because the file format from the Voice Memos app is mpeg-4, I actually had to convert my audio file before it could be uploaded. Once this was done, I received an email with a link to my text when the transcription was finished.
Below is the unedited output that I received. I pasted the text into OneNote so that I could add highlighting and comments.
In all, I am fairly impressed with the output from Scribe. Obviously, there are some problems with it. The text is generally right- organized in paragraphs, but not naturally. For example, the second paragraph is separated from the first, when they should have been kept together. There were periods at the end of the paragraphs. Also there is some random capitalization (i.e. “The Chronic”). Amazingly, names were capitalized (Kelly and Merten), which I thought was remarkable. My guess is that the mix-ups with chutzpah/hot spot and vaudeville/the wave auto are fairly common with words borrowed from other languages.
Obviously, the text will need a little work. While I think Scribe works well for interviews, I am not sure how well it would work for focus groups. Of course, the text needs some review and editing, but I think that in the long run it would be faster to correct mistakes than it is to manually type the transcription. The kicker for me, is how cheap it is: at €0.09 cents per minute, an 80 minute interview could be transcribed for less than $10.00.
I think that both oTranscribe and Scribe lowers the bar to entry for researchers wanting to transcribe audio material.