Stats and Methods Mini-Workshops

The Social Science Research Center will present a series of short statistics and methods workshops, beginning in February 2017.  Senior Research Methodologist Jessica Bishop-Royse will present on topics of interest to the DePaul Research Community.  The first of these workshops will be on Stata File and Data Management.

15422638442_e239227dce_o

In this session, Jessica will discuss various methods for getting data into Stata, as well as proper file management in order to reproduce results for publication.  This workshop will take place at noon on Thursday February 23, 2017 in the conference room in Suite 3100 of 990 W. Fullerton.

Resources for Learning Stata

I started thinking about it the other day and realized that I have been using Stata for over ten years.  OVER TEN YEARS!!  Seriously, where has the time even gone?

I realized that we might be due for a nifty resource list for learning Stata.  There is a lot out there, and it can be difficult to wade through what is there when you have different types of questions and problems.  Sometimes I just need a quick reminder on the exact formatting of the code/syntax for a particular command.  Sometimes I am trying to help novice users get their feet wet.  The former needs a solution at the 3 foot level, the latter needs a 30,000 foot view.

Below, I have categorized some of the Stata Resources that I use often.  As I find more and more items, I will come back and add to this list.

Beginning/Broad Knowledge:

A Gentle Introduction to Stata– Last published this year, this text is in the 5th edition and is good for learning the specifics of data management, exploratory statistics, and analyses. Princeton Website– A step by step tutorial for working in Stata.  Also covers a lot more of the “why”.

UNC Pop Center Website-Has commands sorted by function for ease.  If you are looking for help on combining data files, you’d find five examples commands you could use.

IDRE UCLA Website– This is a wonderful resource for interpreting Stata output.

YouTube Channels:

edX Channel on Getting Started with Stat

David Braudt Introductory Stata workshop

Specific Knowledge:

Stata PDF Manuals- these are pdf files that are accessible from the Help menu of Stata.  The manuals are organized according to type of analysis, like there is a document for Time Series Analysis and one for Structural Equation Modeling.

The Workflow of Data Analysis Using Stata– Despite the fact that this edition was published in 2008, there is a lot of extremely valuable information here on how to organize Stata data and do files, as well as version control.

YouTube Channels:

StataCorp Channel

 

 

 

 

 

Field Learning

Newly graduated Master of Public Health (MPH) students Adenike Sosina and Joselyn Williams recently talked about the extra-curricular skills they acquired as research assistants at the Center for Community Health Equity (CCHE). Their analysis of one project will be displayed at the 9th annual Health Disparities & Social Justice Conference that CCHE and MPH will host at the DePaul Center on August 12.

In a conference poster, they will summarize the focus group discussions that CCHE helped Rush conduct in conjunction with Rush Medical Center’s comprehensive Community Health Needs Assessment. The focus groups were made up of residents and stakeholders from the 8 Chicago West Side community areas (West Town, Austin, East Garfield Park, West Garfield Park, Near West Side, North Lawndale, South Lawndale, and Lower West Side) and 3 near west suburbs (Forest Park, Oak Park, and River Forest) that Rush serves. They were formed to discover what Adenike described as “the impact of the communities’ perceptions, their needs, things they believed to be beneficial.” That should help Rush understand what makes a good community and what relationships community members value, Joselyn added.

The two researchers began working at CCHE and with CCHE Co-Director and Associate Professor of Sociology Fernando De Maio in 2015—Adenike as CCHE Program Assistant and Joselyn as CCHE Graduate Assistant. Founded jointly in 2015 and based at DePaul, CCHE is a partnership between DPU and Rush designed to link social scientists, students, community groups, and health care professionals in a search for data-based solutions to community health problems.

Last fall and winter Adenike and Joselyn collaborated with CCHE on the assessment report Rush prepares every three years to evaluate the overall state of health in its service areas and to develop internal implementation strategies and community collaAdenikeborations. Using NVivo software, they later analyzed 11 “massive” focus group transcripts—also prepared by a number of DePaul and Rush students—to identify recurring themes such as resources, education, socialization, social division, health care, safety, responsibility, and ownership, Adenike said.

“The software itself served as a resource,” said Joselyn, a self-describJoselyned ‘data nerd’. “[It’s] kind of intuitive. There’s not a lot of bulky things you have to have previous help with.” The researchers also utilized SSRC technical and consulting resources, for transcribing the focus group discussions and for training in GIS and mapping fundamentals. The poster will illustrate the findings of their analysis.

“There was an array of other concerns, besides health, in which they wanted their voices to be heard,” said Adenike. She was impressed by the range of what focus group participants wanted to convey. Across communities, focus groups cited the lack of resources, including insufficient recreational outlets for youth, job opportunities, access to retail and good food, and inadequacies in the city’s educational system.

“…It’s like we’re almost a forgotten community…,” a member of the North Lawndale focus group complained. “And if we could just get a lot of these young guys some work and young women and young men to work, it will be a big change in the community,” a West Garfield Park participant offered.

In conversations about what they liked about their communities, participants voiced “probably a lot more positive thoughts around social cohesion,” Joselyn observed. “Most identified with their community,” she said. “I didn’t feel like anyone said ‘this is per se a bad community.’ They recognized the good and the bad. They wanted the community to be better.” Discussions about how Rush might partner with the community produced suggestions for collaborating with schools, operating mobile clinics to provide services such as back-to-school vaccinations, or pairing medical school students with community teens around health issues and mentoring, Adenike noted.

Both MPH graduates agreed that their work at CCHE leaves them feeling better prepared as they start their own careers. Joselyn, who made some GIS maps for the assessment to show where Rush ranked in child opportunity and hardship indices, appreciated the opportunity to work alongside hospital administrators and to observe how a big organization undertakes a report of this scope. She was struck by the length of the assessment process.

This fall Joselyn will begin teaching English to elementary students in the Gyeongbuk province in South Korea. From there she hopes to explore opportunities for a career abroad in global health. Adenike wants to work in community health practice after her position at CCHE ends in late summer. She’s especially interested in childhood obesity interventions.

At CCHE, graduate and undergraduate student researchers will continue to gain project-based experience working on analyses of the new Healthy Chicago Survey, the creation of an “Index of Concentration at the Extremes” for Chicago census tracts, and comparative analyses of health inequities in Chicago and other cities. DePaul faculty and students will continue collaborating with the Chicago Department of Public Health and other groups across the city as they build on CCHE’s contribution to “Healthy Chicago 2.0”, the city’s four-year initiative to assess and improve health and well-being and reduce inequities among Chicago communities.

Visit CCHE’s website to see the Rush Community Health Needs Assessment report and to learn more about the upcoming Health Disparities & Social Justice Conference at DePaul. Faculty or students doing research on faculty projects who want to access NVivo are invited to contact the SSRC where the program is available in our Lincoln Park computer lab or through remote connection.

Economic Inequality According to Adam Smith

Eliminate poverty and economic inequality disappears.  Not so, says DePaul Political Science Professor David Lay Williams, who treated a recent Mess Hall audience at the SSRC to a preview chapter from ‘The Greatest of All Plagues’: Economic Inequality in Western Political Thought, a book he’s writing for Princeton University Press.

AdamSmith

Returning to an examination of seminal free-marketeer Adam Smith, Williams traces the recurring theme of economic inequality throughout Smith’s writings, particularly in his less celebrated book, The Theory of Moral Sentiments.  And while he finds Smith’s solutions for alleviating desperate poverty stronger than those addressing economic inequality, he points out that Smith was quick to recognize potential pitfalls of inequality at the nascent roots of capitalism.

Smith, whose own 18th Century Scotland was marked by great economic inequality, ascribed its development to a combination of people’s tendencies to base their actions on self-interest, the desire for rank and distinction, and an appetite for both superiority and domination over others.  In commercial societies where people are considered responsible for their station in life where success is measured by wealth and poverty equals failure, two separate moral codes can evolve, observed Smith.  People’s inclination to worship the rich allows the rich to indulge in a very lax moral code, one that tolerates their foibles while subjecting the poor to life-long punishment for theirs.  Likewise, greater wealth will also enjoy greater political authority, continues Smith’s critique.

To Williams, relieving poverty wouldn’t address the pathologies Smith identified or control badly performing political institutions.  What Smith described as the “natural selfishness and rapacity” of the rich has both individual and societal implications.  Pitted against the morally corrupting effects on individual character that Smith warned of, the interests of the poor barely register on the radar of the rich, Williams said.  The more disproportionate the wealth, the more violently and unjustly the rich will treat the poor, a Smithian observation not generally remarked on, Williams noted.

In other chapters of his book, Williams will examine the issue of economic inequality through the lens of Plato, St. Augustine, Hobbes, Rousseau, Mill, and Marx.

Naming Variables in Stata (and other Statistical Packages)

Naming variables in Stata and other statistical packages is definitely a practice in balancing art and science.  Researchers and statisticians alike find themselves balancing their needs for the variable names to be informative, but not so informative that one accidentally mis-specifies variables and producing messy and problematic analysis.  J. Scott Long’s The Workflow of Data Analysis Using Stata details the three types of naming systems that people usually utilize when naming variables.

OLYMPUS DIGITAL CAMERA

Sequential names use a stub followed by sequential digits, like v7, v11, v013 or something more complicated like R0002203, R002205, etc.  The numbers might correspond to the order that the data were collected in or the questions were asked.  Because the names don’t have any meaning, it is easy to use the wrong variable in a set of analyses, or difficult to remember the name of the variable you need.  Because this risk if very real and wastes a lot of time when trying to figure out where the wheels came off of your analysis, some researchers refer to a printed list of variable names, descriptive statistics and variable labels, like what is produced by the command “codebook, compact”.

Source names are those that use the information about where the variable came from as part of the name.  You might see this in a survey where you have questions q1, q2, q3.  A question that comes in multiple parts might get named something like q4a, q4b, q4c.  This would Question 4, parts a, b, and c.  With source names, you might have variables that don’t fit into the scheme, because they might pertain to some aspect of the data collection, like demographic variables or information about the time and site of data collection.  If these kinds of variables are part of your dataset, consider how they will be named prior to including them in your dataset.  These types of names can be more useful than sequential names, but it can still be dodgy when you are looking at a complicated model with these types of names.

Mnemonic naming systems use abbreviations that convey content of the variable (e.g. female, educ, id, state, etc).  These can be much more useful because they provide clues as to what they pertain to.  While these names can be more useful, some consideration is necessary when planning names, because of limitations of statistical packages.  For example, Stata allows for names that are 32 characters long, but will truncate names when listing results.  The default in Stata tends to truncate variables like familyincome_1990, familyincome_2000, and familyincome_2010 as familyincome, familyincome, familyincome in analyses and results tables.  It is best to aim for variable names that are no more than 12 characters, so that if your statistical package does truncate variable names, you can still tell which variable is which.  This might look like: faminc1990, faminc2000, faminc2010.

Long also suggests that it might also be useful to include indicators about the structure of the variable in the name.  You might include b=binary, i=indicator, n=negatively coded scale, and p=positively coded scale.  This lets you know, without having to refer to a codebook what you are looking at when you see variable like bdepres_cesd, that this is a binary item indicating depression based on the CESD.

One final note, be careful with capitalization.  Statistical packages deal with capitalized letters in different ways.  In Stata, Educ, EDUC, and educ all appear to be different variables.  This might not be too much of a problem for you, but consider what happens when you convert from one file format (Stata) to another (like Excel), which may result in dropping extra information (like capitalized letters).  When this happens in Stata, if you have three variables like Educ, educ, and EDUC, the second and third variable names will get converted to something like varNUM, which can be confusing when you are trying to work with your data after a file conversion.

ICPSR 2016: The Schedule is LIVE!!

The Interuniversity Consortium for Political and Social Research (ICPSR) has posted 2016 summer workshop schedule.  The program, housed at the University of Michigan, hosts a full schedule of methodological, research, and statistical workshops through the late spring and summer, both in Ann Arbor and other places around the country.

There are two four-week sessions, the first of which is June 22-July 17, the second is July 20-August 14, which can be tricky to make work.  But, there are also 3-5 day workshops on a variety of topics throughout the summer.

Some noteworthy workshops:

R: Learning by Example (Boulder, CO June 8-10)

Doing Bayesian Analysis: An Introduction (Ann Arbor, MI (?) July 7-10)

Multilevel and Mixed Models Using Stata (Ann Arbor, MI (?), July 27-29)

There are also classes on structural equation modeling, curating and managing data for reuse, social network analysis (in R).  The good news is that DePaul is a member of ICPSR, so interested DePaul faulty would get a break on the tuition.  What do you think?  Interested in spending a few days in Ann Arbor?