Transcription Tools

The Social Science Research Center at DePaul has a micro-lab where researchers (or their graduate students) can access hardware and software to transcribe audio files.  Typically, researchers have used these tools to transcribe interviews and focus groups.  The process is relatively simple: researchers bring their audio files on portable media, which are loaded onto a machine in the micro lab.  This machine has a software called “Express Scribe” and a pedal.  The pedal is used to stop, start, rewind and fast forward the audio within the environment of Express Scribe.  Additionally, the speed of the audio playback can be modified.  In all, this is a great tool and process for individuals to transcribe audio files.  However, it is not without its flaws.  The main flaw is that it requires users to be in the physical space during business hours.  Also, it requires that someone spend the time actually typing the text of the transcription.

In this post, I review two relatively new transcription tools and demonstrate how they might be used to help researchers transcribe spoken language.

The first, oTranscribe is a web-based transcription tool.  With it, you upload an audio file and from within the web page, you control audio playback.  Keep in mind that if a researcher were going to do this on their own (without coming to the SSRC to use our machine and pedal), this would require playing the audio in something like iTunes and typing the text in a text editor (like MS Word).  Which is likely fine, if you’re working on a machine with two monitors.  Even so, stopping and restarting the audio file can be quite cumbersome using this approach- even if you are capable and have figured out how to use hotkeys and shortcuts.  Remember that hotkeys usually require that you be in the program to use it.  So, you’re typing in MS Word, but in order to get audio to stop you have to get back to iTunes with the mouse and actually press stop (or click in the window with iTunes and use a hotkey to stop the audio file).

oTranscribe allows you to do this all in the same place.  Even better, when audio is restarted, it repeats the the last bit of where you left off.  This gives you a chance to get your hands in place and allows for a much easier orientation.  In the default setup, the key to stop and start the audio is the ESC, but you could change that.  Additionally, the audio can be slowed down quite a lot.  I have demonstrated what the process is like here.

I recorded myself reading the beginning of a chapter in Howard Becker’s Writing for Social Scientists on an iPhone (using the Voice Memos app).  Although it sounds like I might be drunk, I am actually not.  I have slowed the audio down enough so that I can keep up typing it.

Overall, not a terribly onerous process.  I think it beats having to toggle back and forth between different programs.

I learned about Scribe, a tool that does automatic transcription.  According to Poynter, it was developed by some students working on a school project.  One of the students had to transcribe 12 interviews, and he didn’t want to do it (who does?).  He built a script that uses the Google Speech API to transcribe the speech to text.  Based in Europe, the Scribe website asks that a user upload an mp3 and provide an email address.  The cost to have the file transcribed is €0.09 cents per minute.  As of now, there is a limit to how long the audio file can be (80 minutes).  Because the file format from the Voice Memos app is mpeg-4, I actually had to convert my audio file before it could be uploaded.  Once this was done, I received an email with a link to my text when the transcription was finished.

Below is the unedited output that I received.  I pasted the text into OneNote so that I could add highlighting and comments.

scribe_edit

In all, I am fairly impressed with the output from Scribe.  Obviously, there are some problems with it.  The text is generally right- organized in paragraphs, but not naturally.  For example, the second paragraph is separated from the first, when they should have been kept together.  There were periods at the end of the paragraphs.   Also there is some random capitalization (i.e. “The Chronic”). Amazingly, names were capitalized (Kelly and Merten), which I thought was remarkable.  My guess is that the mix-ups with chutzpah/hot spot and vaudeville/the wave auto are fairly common with words borrowed from other languages.

Obviously, the text will need a little work.  While I think Scribe works well for interviews, I am not sure how well it would work for focus groups.  Of course, the text needs some review and editing, but I think that in the long run it would be faster to correct mistakes than it is to manually type the transcription.  The kicker for me, is how cheap it is: at €0.09 cents per minute, an 80 minute interview could be transcribed for less than $10.00.

I think that both oTranscribe and Scribe lowers the bar to entry for researchers wanting to transcribe audio material.

Advertisements

Are Chicago’s Safe Passage Routes Located in the Highest Risk Areas?

Safe passage routes to school provide not only a sense of safety for Chicago students from pre-K through high school, but they reduce crime involving students and help increase school attendance. Chicago’s Safe Passage program was introduced in 2009 after the beating death by gangs of 16-year-old Fenger High School honors student Derrion Albert, which was captured on cell phone video. His death and the circumstances received national attention along with a series of other incidents involving CPS students caught in gang violence. Since then, the program has expanded to include schools, parents, residents, law enforcement officials and even local businesses in efforts to provide students with a safe environment. The various types of safe passage programs among the 51 safe route programs currently available include: safe haven programs in which students who fear for their safety can find refuge at the local police station, fire house, library and even convenience stores, barbershops and restaurants; patrols along school routes by veterans, parents and local residents; and walking to school programs in which parents and local residents create a presence to help deter unlawful incidents.

The map below shows the number of all crimes committed in the city of Chicago during the current school year, and the locations of schools and safe routes among those communities that have safe routes. Currently, there are 517 Chicago public schools, of which, only 136 Chicago public schools (26.3% of all schools) fall within the 51 safe routes. Although the safe routes are located in 37 of the high crime communities in general (south, west and northeast sides of Chicago), they do not exist in the pockets of the highest crime incidents (1,500+ highlighted in burgundy) where children are the most vulnerable. Of the 47 schools that fall within the extreme crime areas (1,500+ incidents a year), only 6 have safe routes; the others offer no safe passage options. A list of the schools appears at the end of this blog.

Click through to see the enlarged image.


SafePassage_Routs

Schools located in extremely high-crime areas of Chicago (Schools highlighted in green have safe passage routes):
Bennett, Bowen HS, Bradwell, Camelot Safe – Garfield Park, Camelot Safe Academy, Clark HS, Coles, Community, Ericson, Frazier Charter, Frazier Prospective, Galapagos Charter, Great Lakes Charter, Gregory, Harlan HS, Hefferan, Heroes, Herzl, Hirsch HS, Hubbard HS, Learn Charter – Butler, Leland, Mann, Mireles, Noble Charter – Academy, Noble Charter – Baker College Prep, Noble Charter – DRW, Noble Charter – Muchin, Noble Charter – Rowe Clark, Oglesby, Plato, Polaris Charter, Powell, Schmid, Shabazz Charter – Shabazz, Smith, South Shore Intl HS, Webster, Westcott, Winnie Mandela HS, YCCS Charter – Association House, YCCS Charter – CCA Academy, YCCS Charter – Community Service, YCCS Charter – Innovations, YCCS Charter – Olive Harvey, YCCS Charter – Sullivan, YCCS Charter – Youth Development

 

Implementing visualization techniques in faculty research
The image of the map reflects the different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

Stata’s datasignature command

As we have discussed before, Stata file management can be tricky.  There is the incessant and iterative updating of the files.  Different kinds of files do different things… data cleaning, data management, item creation, descriptive analyses, regression analyses.  And there then there is version management.

18019403462_9d462cf6a4_o

There is a handy little trick in Stata called datasignature.  Long story short, data signature protects the integrity of your data.  When the command is executed, Stata generates a signature string, which is based on 5 characteristics (checksums) that describe your dataset, including # of cases, characteristics regarding the names of variables, # of variables, # of values of variables.

The next time you load the dataset, you can use the datasignature confirm command, and Stata will report whether or not the data have changed since you’d last used the dataset.  If they haven’t changed, then Stata will report, “data unchanged since ________ (date last saved)”.  If your data have changed, the datasignature command will indicate the day of the last save.

Why might this be important?

This might be crucial for teams of researchers collaborating on a large analysis project.  Particularly if multiple people are working with the cleaning, management, and analysis files, and if all of those people don’t have similar levels of concern for hygienic data management.  It can become a problem if someone comes back to a dataset not realizing it has changed.  Datasignature can help eliminate this problem.

UMass Grad Student Gets International Fame and Glory After Discovery of Excel Error in Important Study

Note to self: when your research will be used as the basis of national economic policy, it might be a good idea to have someone else take a look-see at your calculations before you click the “submit” button.

Thomas Herndon, a UMASS-Amherst graduate student in economics, is the lead author on a paper that critiques a very influential 2010 study on public debt that was conducted by two Harvard professors, Carmen Reinhart and Kenneth Rogoff.  Their study contended that a high debt to GDP ratio (greater than 90%) would cause slow economic growth.  This has been used as the basis for austerity measures in struggling economies.

Herndon’s efforts were the product of a term paper assignment, wherein he was directed to replicate an economic study.  When he received the data from one of the author’s he noticed some irregularities in the spreadsheet.

Mike Konczal of the The Next New Deal Blog broke the story (located here).  Needless to say, there has been quite the hullabaloo surrounding Herndon.  Overnight, Herndon has become quite the rock star.  He appeared on the Colbert Report– which is about as much recognition that a econ grad student could hope for.

On the Importance of Rock Solid Methods…

I know it’s a bit after the fact, but did anyone else catch Antonin Scalia aluding to Mark Regnerus’s (widely debunked) “research” on the detrimental effects of gay parenting on children’s outcomes?  NPR has transcript and audio from the oral argument on March 26, 2013.  I’ve pulled out the section where Justice Scalia mentions work (not by name), but we all know who he is talking about.

JUSTICE SCALIA: Mr. Cooper, let me — let me give you one — one concrete thing. I don’t know why you don’t mention some concrete things. If you redefine marriage to include same-sex couples, you must — you must permit adoption by same-sex couples, and there’s -­ there’s considerable disagreement among — among sociologists as to what the consequences of raising a child in a — in a single-sex family, whether that is harmful to the child or not. Some States do not — do not permit adoption by same-sex couples for that reason.

photo (8)Justice Scalia’s comments are the very reason why in social science we have to be so careful with what we publish.  I believe that way we think about research has changed.  Most of us that conduct research in Academia do so with the idea that we want to make the world a better place.  That is why we went to graduate school and toiled under conditions with low pay and long hours.  We hope that we are doing so to improve the conditions of someone.  To improve their life, their world.  And we want our research to contribute to that end.  I believe that most researchers are trying to do that.

However, some of us get myopic about our research and don’t necessarily appreciate the context in which it will be received.  Of course, if we ONLY considered the socio-cultural ramifications of the research that we publish, then many wouldn’t publish.  Think about it: in addition to the soul grinding process that can be academic writing, we now have to consider how our work will be received or not?  Whether anything changes?  How many babies have died in the US since the American Academy of Pediatrics began the “Back to Sleep” Initiative in 1994?   Whatever the number is, it’s too many.  How many unrestrained passengers are killed every year in motor vehicle accidents, despite the fact that every vehicle comes equipped with safety belts? Too many.  If we consider only the fact that many babies STILL sleep in unsafe sleeping conditions or that people continue to ride in cars without wearing seatbelts as our only measures of success, then we might think that research on these matters does little to change socio-cultural behaviors that influence the phenomena we study.

Obviously, this kind of thinking isn’t usually entertained for long by prolific and productive academic scholars.  Their work serves as the narrative of our social reality that policy makers must be able to consider at face value.  What’s more, is that when the work is the product of a hurried review process, sloppy methodology, or questionable ethical relationships (in Regnerus’s study with a funder), it is indistinguishable from the rest of the body of research (those publications with a deliberate review process, solid methodology, and no unethical relationships).

This issue has become more important then ever.  In our digitally connected world where the line between “opinion/commentary” and “fact” is blurry and varies according to who is involved, our research is being used in ways that we may never have imagined.  In this particular case, Regnerus’s research became a tool in the latest battle (civil rights for homosexuals) between the right and the left.  And that is not the problem- this is why we do research, to contribute to the national dialogue that leads to change and improving the lives of people.  The problem is, that in this particular case, the conclusions based on methodologically weak research are being used to validate the unequal treatment of Americans.  And that is deplorable.

Do John Schools Really Decrease Recidivism?

Before Rachel left Chicago to work at Case Western Reserve University’s Begun Center for Violence Prevention Education and Research, she put the finishing touches on this report, authored with Ann Jordan of the Rights Work Initiative at American University. We welcome your comments.

Do John Schools Really Decrease Recidivism?
A methodological critique of an evaluation of the San Francisco First Offender Prostitution Program

by Rachel Lovell and Ann Jordan

A growing number of governments are creating “john schools” in the belief that providing men with information about prostitution will stop them from buying sex, which will in turn stop prostitution and trafficking. John schools typically offer men arrested for soliciting paid sex the opportunity (for a fee) to attend lectures by health experts, law enforcement and former sex workers in exchange for cleared arrest records if they are not re-arrested within a certain period of time. A 2008 examination of the San Francisco john school, “Final Report on the Evaluation of the First Offender Prostitution Program,” claims to be the first study to prove that attending a john school leads to a lower rate of recidivism or re-arrest (Shively et al.). Despite its claims, the report offers no reliable evidence that the john school classes reduce the rate of re-arrests.

This paper analyzes the methodology and data used in the San Francisco study and concludes that serious flaws in the research design led the researchers to claim a large drop in re-arrest rates that, in fact, occurred before the john school was implemented.

Read the full report.

Kristen Miller: Question Design

Kristen Miller, the director of the CDC’s Question Design Research Lab, will be at DePaul next week sharing her survey know-how with anyone who wants to learn more about how survey research on a grand scale operates on the ground. Check out the schedule below and join us at the SSRC for a promising display of survey and methodological insights and derring-do.

Friday, February 10, 1 pm: Faculty Seminar
“Development and Evaluation of a Sexual Identity Measure for the National Health Interview Survey (NHIS)”
Miller will describe the use of qualitative research in developing a precise sexual identity measure for a large-scale quantitative survey and the resulting complications.

Monday, February 13, daytime: Lab Visits
Faculty are invited to schedule appointments to meet with Miller to discuss their research, questionnaire design, or other research questions. 

Monday, February 13, 6 – 7:30 pm: Public Lecture
“Question Evaluation at the National Center for Health Statistics”
This lecture, open to the public, will center on Miller’s work at the CDC and will consider examples of questions that inadvertently compromised data quality through a lack of rigorous evaluation.

I talked with Kristen today to learn more about what she does and why it matters.

Continue reading “Kristen Miller: Question Design”