Naming Variables in Stata (and other Statistical Packages)

Naming variables in Stata and other statistical packages is definitely a practice in balancing art and science.  Researchers and statisticians alike find themselves balancing their needs for the variable names to be informative, but not so informative that one accidentally mis-specifies variables and producing messy and problematic analysis.  J. Scott Long’s The Workflow of Data Analysis Using Stata details the three types of naming systems that people usually utilize when naming variables.

OLYMPUS DIGITAL CAMERA

Sequential names use a stub followed by sequential digits, like v7, v11, v013 or something more complicated like R0002203, R002205, etc.  The numbers might correspond to the order that the data were collected in or the questions were asked.  Because the names don’t have any meaning, it is easy to use the wrong variable in a set of analyses, or difficult to remember the name of the variable you need.  Because this risk if very real and wastes a lot of time when trying to figure out where the wheels came off of your analysis, some researchers refer to a printed list of variable names, descriptive statistics and variable labels, like what is produced by the command “codebook, compact”.

Source names are those that use the information about where the variable came from as part of the name.  You might see this in a survey where you have questions q1, q2, q3.  A question that comes in multiple parts might get named something like q4a, q4b, q4c.  This would Question 4, parts a, b, and c.  With source names, you might have variables that don’t fit into the scheme, because they might pertain to some aspect of the data collection, like demographic variables or information about the time and site of data collection.  If these kinds of variables are part of your dataset, consider how they will be named prior to including them in your dataset.  These types of names can be more useful than sequential names, but it can still be dodgy when you are looking at a complicated model with these types of names.

Mnemonic naming systems use abbreviations that convey content of the variable (e.g. female, educ, id, state, etc).  These can be much more useful because they provide clues as to what they pertain to.  While these names can be more useful, some consideration is necessary when planning names, because of limitations of statistical packages.  For example, Stata allows for names that are 32 characters long, but will truncate names when listing results.  The default in Stata tends to truncate variables like familyincome_1990, familyincome_2000, and familyincome_2010 as familyincome, familyincome, familyincome in analyses and results tables.  It is best to aim for variable names that are no more than 12 characters, so that if your statistical package does truncate variable names, you can still tell which variable is which.  This might look like: faminc1990, faminc2000, faminc2010.

Long also suggests that it might also be useful to include indicators about the structure of the variable in the name.  You might include b=binary, i=indicator, n=negatively coded scale, and p=positively coded scale.  This lets you know, without having to refer to a codebook what you are looking at when you see variable like bdepres_cesd, that this is a binary item indicating depression based on the CESD.

One final note, be careful with capitalization.  Statistical packages deal with capitalized letters in different ways.  In Stata, Educ, EDUC, and educ all appear to be different variables.  This might not be too much of a problem for you, but consider what happens when you convert from one file format (Stata) to another (like Excel), which may result in dropping extra information (like capitalized letters).  When this happens in Stata, if you have three variables like Educ, educ, and EDUC, the second and third variable names will get converted to something like varNUM, which can be confusing when you are trying to work with your data after a file conversion.

Michael McIntyre at Jan. Mess Hall

20160125_140342_001_resized

Michael McIntyre, chair and associate professor in the Department of International Studies, previewed a conference paper he is preparing, at an SSRC Mess Hall presentation on Jan 25. After briefly summarizing key touchstones and names in the development of the field of international relations (IR), Michael challenged the common perception of E. H. Carr as IR’s “first realist.” In fact, what’s called the “first great debate” flowing from Carr’s indictment of utopian explanations of international politics in his 1939 classic, The Twenty Years’ Crisis, 1919-1939, never really occurred, Michael contends. In Michael’s revised reading, Carr’s post-World War I appeasement toward Germany; his dedication of The Twenty Years’ Crisis to Marxist-inspired Karl Mannheim, founder of the sociology of knowledge; and Carr’s later abandonment of the IR arena to work on his masterwork, a 14-volume, sympathetic history of Soviet Russia to 1929, all argue against the depiction of Carr as proto IR realist. To the contrary, argues Michael, Carr was doing just what realist theorists warn against.

State of Minimum Wage$ in the U.S.

Since the U.S. instituted a federal minimum wage rate in 1938, various state and local governments have pushed for higher rates. Seattle was the first to increase its minimum wage to $15 an hour by 2017, a $2 increase every year starting from 2015. San Francisco followed suit with an increase to $15 by 2018. In 2015, Oakland increased its rate to $12.25, and Chicago will slowly increase its minimum wage from $8.25 to $13 an hour by 2019. The rate in Washington, D.C. is currently $10.50 and will be increased to $11.50 by the end of 2016. The federal minimum wage has been $7.25 an hour since 2009.

According to a U.S. Bureau of Labor Statistics 2015 report, in 2014 (the latest year detailed data is available), 3.8% of all hourly workers 16 years and older (roughly 3 million workers) were paid at or below the federal minimum wage, with 1.6% at the federal level and 2.2% below. Women were 2.9% of the total and men 1.6%. A regional breakdown showed that 2.6%-2.8% of Southern workers fell below the federal minimum with Louisiana reporting the highest percentage of workers (3.5%) making less than the minimum.

The following infographic shows the state of the minimum wage throughout the U.S.

Click through to see the enlarged image.

MinumumWage

Techniques Used
The above visualization includes 3 types of techniques:

Quantitative Analysis: Two chart types were used to visualize quantitative data on wages: a trend chart shows the historic U.S. minimum wages adjusted for inflation using 2015 CPI (consumer price index) and a bubble chart shows countries with hourly minimum wages higher than that of the U.S.

Statistical Analysis (GIS): The spatial analysis shows statistical analysis ranges from basic counts such as total characters and words, number of lines and syllables, and average words per line or sentence to more complex indices and densities.

Graphics: Graphics and images used in the infographics were edited using Photoshop graphic design software.

Implementing visualization techniques in faculty research
The image above shows different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Some visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
If you want help visualizing your own research findings or wonder if your research lends itself to similar techniques including data acquisition and preprocessing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

ICPSR 2016: The Schedule is LIVE!!

The Interuniversity Consortium for Political and Social Research (ICPSR) has posted 2016 summer workshop schedule.  The program, housed at the University of Michigan, hosts a full schedule of methodological, research, and statistical workshops through the late spring and summer, both in Ann Arbor and other places around the country.

There are two four-week sessions, the first of which is June 22-July 17, the second is July 20-August 14, which can be tricky to make work.  But, there are also 3-5 day workshops on a variety of topics throughout the summer.

Some noteworthy workshops:

R: Learning by Example (Boulder, CO June 8-10)

Doing Bayesian Analysis: An Introduction (Ann Arbor, MI (?) July 7-10)

Multilevel and Mixed Models Using Stata (Ann Arbor, MI (?), July 27-29)

There are also classes on structural equation modeling, curating and managing data for reuse, social network analysis (in R).  The good news is that DePaul is a member of ICPSR, so interested DePaul faulty would get a break on the tuition.  What do you think?  Interested in spending a few days in Ann Arbor?

 

 

How Carolyn Goffman Works

HIW_Carolyn Goffman

Name: Carolyn Goffman
Location: DePaul
Current Gig:  Instructor, English Lit
One word that best describes how you work:  Absorbed
Current mobile device:  cheap Motorola smart phone
Current computer:  2012 MacPro
 What apps/software/tools can’t you live without?  OED
What’s your workspace setup like: clear desk, window, good lamp; computer, ipad (read texts on ipad, write notes on computer)

I write at my desk at home, in front of a window that looks out on my tiny backyard and over the alleyway. Its bleak in the winter, with bare branches and ugly backs of buildings (and, usually, snow), but in the summer the leaves from the big tree in my yard fill up my view.  I need to be alone when I write (except for my dog, who is always present). Although I deeply admire people who complete novels and dissertations while sitting in a coffee shop, I hate writing in public.  I have pictures on the wall that connect me to my work–one a 1877 print of people walking in Istanbul through rain to hear the reading of the Ottoman Constitution; the other a poster that captures the fleeting optimism of the Young Turk Revolution in 1908. I stick postcards up around the windowsill that remind me of my travels. No pictures of family or friends on my desk — I need to NOT think about the people I love and take care of when I am writing.

What’s your best time-saving shortcut/life hack? Do you automate something that used to be a time sink? Do you relegate email to an hour a day? 

I have a number of roles in my life and they all seem to require different speeds. For my caregiver role, I slow way, way down and I have to be careful not to get lost in that slow-lane vortex. For teaching, I go at medium to top speed, and generally feel energized. Writing is peculiar because it requires complete focus and creates its own pace, which is sometimes much, much too slow. My trick to speed up is to pretend that I have to turn the article in, or give the talk, in two hours. What’s the quick and dirty to put something in presentable shape really fast? Then I do that, and it jump starts me.

What’s your favorite to-do list manager?

I use a classic engagement calendar (paper); supplemented by my Google calendar on my phone, and a notepad app on my phone, where I make plans before I go to sleep at night.

Besides your phone and computer, what gadget can’t you live without and why? 

The refrigerator.

What everyday thing are you better at than everyone else? What’s your secret?

I’m good at making lists and getting things done. I’m also good at listening. 

 What do you listen to while you work?  

Depends on time of day and how hard the writing is.  My most used Pandora stations are Latin Jazz and Classical Guitar.

What do you do to stay inspired? Who are some of your favorite artists?

I listen to books while I walk the dog and do dishes — mysteries, recent fiction. Nothing too complex. I like a narrative that propels itself forward. I like reading and listening to books that are well written, no matter what genre.

What sort of work are you up to now? 

I am working on the story of Mary Mills Patrick, a fearsome woman who ran the Constantinople Woman’s College for 34 years, through wars, revolutions, massacres, and end-of-empire chaos. She went to Turkey in 1871 as a young idealistic missionary and stayed on, replacing her religious fervor with a determination to help women and promote education.

What are you currently reading? 

Different books in different rooms and on i-pad and in audiobooks:  Emma Ponafidine’s memoir of escape from the Bolshevik Revolution; Ambassador Morgenthau’s memoir about Constantinople during World War One and the Armenian massacres; Ian Fleming’s From Russia with Love; Theatre Shoes, by Noel Streatfield, New Yorkers. Currently listening to Sara Paretsky’s Brush Back.

Are you more of an introvert or an extrovert? Maybe both?

Used to be introvert, now I’m both. I like meeting new people, but I need more me/alone-time than most people.

What’s your sleep routine like?

I am perfectly happy if I get 6.5 or 7 hours a night, and perfectly miserable if not. Sometimes I don’t seem to need as much sleep and enjoy the alone-time of the early morning hours.  

Fill in the blank: I’d love to see _________ answer these same questions.  Why?

George Eliot. She is my god.

What’s the best advice you’ve ever received?

Write what you want to write.

Is there anything else you’d like to add that might be interesting to readers and fans?

It really helps to know that all writers work really hard. It doesn’t come easily to anyone. I admire people who write every day no matter what AND get work finished AND have a life and take care of people.


 The How I Work series featured on the re/search blog is shamelessly stolen from Life Hacker’s How I Work series.   The SSRC’s version asks DePaul’s heroes, experts, and individuals of note to share their shortcuts, workspaces, routines, and more. Have someone you want to see featured, or questions you think we should ask? 

.