How many more men than women suffered vehicular fatalities in the U.S in 2012?

According to the Center for Disease Control and Prevention’s Fatality Analysis Reporting System, more males died in vehicular accidents than females in every single state in 2012 (the latest year data is available). The graph below shows the rate of deaths of occupants involved in motor vehicle crashes by gender per 100,000 population in alphabetical order by state.

North Dakota ranked highest in male deaths at 29.3 and Missouri had the most female fatalities in the country, 14.2. In Illinois, the male death rate of 6.3 was nearly double that of females, 3.2.

Top 5 states for male vehicular death rates
State                       Death Rate (per 100K)
North Dakota                    29.3
Mississippi                        22.3
Wyoming                           21.9
Montana                             21.9
Oklahoma                          19.2

Top 5 states for female vehicular death rates
State                       Death Rate (per 100K)
Wyoming                           12.9
Montana                            10.9
North Dakota                    10.5
Arkansas                           10.4
Kentucky                           10.1

 

Click through to see the enlarged image.

DeathRate

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

Are Chicago’s Safe Passage Routes Located in the Highest Risk Areas?

Safe passage routes to school provide not only a sense of safety for Chicago students from pre-K through high school, but they reduce crime involving students and help increase school attendance. Chicago’s Safe Passage program was introduced in 2009 after the beating death by gangs of 16-year-old Fenger High School honors student Derrion Albert, which was captured on cell phone video. His death and the circumstances received national attention along with a series of other incidents involving CPS students caught in gang violence. Since then, the program has expanded to include schools, parents, residents, law enforcement officials and even local businesses in efforts to provide students with a safe environment. The various types of safe passage programs among the 51 safe route programs currently available include: safe haven programs in which students who fear for their safety can find refuge at the local police station, fire house, library and even convenience stores, barbershops and restaurants; patrols along school routes by veterans, parents and local residents; and walking to school programs in which parents and local residents create a presence to help deter unlawful incidents.

The map below shows the number of all crimes committed in the city of Chicago during the current school year, and the locations of schools and safe routes among those communities that have safe routes. Currently, there are 517 Chicago public schools, of which, only 136 Chicago public schools (26.3% of all schools) fall within the 51 safe routes. Although the safe routes are located in 37 of the high crime communities in general (south, west and northeast sides of Chicago), they do not exist in the pockets of the highest crime incidents (1,500+ highlighted in burgundy) where children are the most vulnerable. Of the 47 schools that fall within the extreme crime areas (1,500+ incidents a year), only 6 have safe routes; the others offer no safe passage options. A list of the schools appears at the end of this blog.

Click through to see the enlarged image.


SafePassage_Routs

Schools located in extremely high-crime areas of Chicago (Schools highlighted in green have safe passage routes):
Bennett, Bowen HS, Bradwell, Camelot Safe – Garfield Park, Camelot Safe Academy, Clark HS, Coles, Community, Ericson, Frazier Charter, Frazier Prospective, Galapagos Charter, Great Lakes Charter, Gregory, Harlan HS, Hefferan, Heroes, Herzl, Hirsch HS, Hubbard HS, Learn Charter – Butler, Leland, Mann, Mireles, Noble Charter – Academy, Noble Charter – Baker College Prep, Noble Charter – DRW, Noble Charter – Muchin, Noble Charter – Rowe Clark, Oglesby, Plato, Polaris Charter, Powell, Schmid, Shabazz Charter – Shabazz, Smith, South Shore Intl HS, Webster, Westcott, Winnie Mandela HS, YCCS Charter – Association House, YCCS Charter – CCA Academy, YCCS Charter – Community Service, YCCS Charter – Innovations, YCCS Charter – Olive Harvey, YCCS Charter – Sullivan, YCCS Charter – Youth Development

 

Implementing visualization techniques in faculty research
The image of the map reflects the different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

Vehicle Theft in Chicago

Even though vehicle thefts accounted for only 3.9% (10,099) of all crimes in Chicago last year, 62% of the stolen vehicles were recovered with severe damage says the Chicago Police department. Most often the vehicles are stolen by organized rings to be sold on black-markets or shipped overseas, and stripped for parts and resold to various body-shops, or are even resold to unsuspecting customers. In Chicago, 78.9% of the vehicles are stolen from streets, alleys and alongside sidewalks, 8.6% from buildings other than residences, 6.7% from parking lots, 5.5% from residences, and 0.3% from the airports.

The map below shows a hot-spot analysis of the communities that are most and least affected by vehicle theft. The visualization shows statistically significant (statistically significant is the likelihood that a theft is caused by something other than mere random chance) hot-spots in red where a high number of thefts occur and statistically significant cold-spots in blue where few or no thefts occur.

Communities most-prone to vehicle theft (not safe): Uptown (3) in the north, or Austin (25), Avondale (21), Logan Square (22), Hermosa (20), Humboldt Park (23), West Town (24), East/West Garfield Parks (26, 27), Near West Side (28), North Lawndale (29) in the west , or any south central parts of Chicago, namely Chicago Lawn (66), East/West Englewoods (67, 68), Greater Grand Crossing (69), South Shore (43), Auburn Gresham (71) are prone to vehicle thefts.

Communities least-prone to vehicle theft (safe): Edison Park (9), Norwood Park (10), Jefferson Park (11), Forest Glen (12), North Park (13), Dunning (17), Portage Park (15), Lincoln Square (4), North Center (5), Lincoln Park (7) in the north and Bridgeport (60), New City (61), Garfield Ridge (56), Clearing (64), Ashburn (70), West Pullman (53), Morgan Park (75), Beverly (72), Washington Heights (73), East Side (52) and Calumet Heights (48) in the south are least prone to vehicle thefts.
Click through to see the enlarged image.

VehicleTheft_StatSig_2015

 

Techniques Used
The above visualization includes 2 major types of spatial analysis techniques. The vehicle theft locations were geocoded using the addresses and then, Getis-Ord Gi* statistic was used to generate a hot-spot analysis to identify statistically significant clusters.

Implementing visualization techniques in faculty research
The image of the map reflects the different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

American Life Panel

I had the great fortune to attend the annual meeting of the Population Association of America last week.  I first attended when it was in New York City, and was sort of intimidated by it- in terms  of heavy hitters in demographic and population health research, they are all there.  The men and women whose work shaped the foundations of most demography students’ understandings of the world go to PAA: Sam Preston was on the program.  The guy that LITERALLY wrote the book on life table analysis.

I have come to appreciate the depth of the sessions offered.  As a demographer and health researcher, I love the fact that at any time, there are multiple sessions where I might find something of interest or useful to me.  This is different than the annual Sociology meetings, where the demography and population health sessions are all held on one day- leaving the demographers either very bored or with a lot of extra time on their hands because many of the sessions are outside of population and health.

Yes, I am aw4835996128_60a1075127_oare that this might make me a bad sociologist.

That said, I wandered into the Rand American Life Panel exhibition.  “What?  Excuse me, what?”  You ask.

Well- let me tell you.

RAND has a standing, nationally representative, probability-sampled panel of respondents that can be deployed for survey research.  It started in 2003 with a five year grant from NIA to study methodological issues of internet interviewing among older populations.  It has expanded from 800 panel members over the age of 40 to over 6000 participants, aged 18 and older.  This in and of itself is pretty nifty.  But it also includes a vulnerable population cohort (individuals recruited and incentivized from zip code area with high percentages of Hispanics or low-income individuals).

This is cool for primary data collection efforts.  Let’s say you get some $$ and want to do a survey research project.  But maybe you don’t have the infrastructure or support to have a massive data collection effort.  RAND might be a decent avenue for you to get responses to your survey.

But, even cooler, is their data repository.  After initial embargoes on it, the data go into a database that can be used *for free* by researchers.   The topics are fairly diverse, including life satisfaction, social security and health,  presidential polling, health literacy, etc.  It’s brilliant.

From a demographic/health perspective, some of the more interesting datasets are on Longevity,  Breast Cancer, Long Term Care Insurance, and Health Expectations.

Very cool, indeed.

CO2 Emission

Carbon dioxide (CO2) emissions are both natural and man-made. Natural sources include oceans, soil, plants, animals and volcanoes while human-related CO2 is emitted through deforestation, burning of fossil fuels such as coal, natural gases and oil for transportation, and energy for commercial, industrial and residential use. Although human-related emissions account for only 5% of the total, they have increased enormously overtime. According to the U.S. EPA, since 1970, global CO2 emissions have increased 90%, the major contributors (78%) being fossil fuel combustion and industrial processes, followed by deforestation, land-use change and agriculture.

While there are many ways to reduce carbon emission, the most effective is to reduce the consumption of fossil fuel. I pride myself for being environmentally conscious – reducing wastes by using energy-efficient products (furnace, light bulbs, etc.), taking public transportation, recycling and reusing things. Yet, using the “carbon footprint,” a calculator provided by the U.S. EPA, my annual footprint for home energy, transportation and household waste totaled 18,131 lbs., compared to the U.S. average of 24,550 lbs. for a single householder. However, this doesn’t include the CO2 emissions related to producing and delivering my daily consumption of certain goods (food, beverages, clothing, etc.) and services (restaurants, local grocer, etc.) including the amount of energy I use both at work (technology equipment, etc.) and commuting there (based on my 12-15 hours spent outside my home each day). This tool also revealed that just switching my washing machine from warm to cold water would cut carbon emission 150 lbs. per year and save me about $12. If you’d like to see your carbon footprint and/or identify ways to reduce consumption and save money, click on the EPA’s calculator here.

The following infographic shows the extent and distribution of CO2 emissions in the world, the U.S. and Illinois, including the carbon footprints of certain products.

Click through to see the enlarged image.


CarbonEmission_Infograph

Techniques Used
The above visualization includes 3 types of techniques:

Quantitative Analysis: A bar and pie chart were used to visualize quantitative data to show carbon emissions by various sectors over time and in 2013.

Statistical Analysis (GIS): Spatial analysis included two major techniques. The choropleth maps and classification methods were used to show the distribution of the emission levels globally and for the U.S.

Graphics: Images were obtained from Google and modified using Photoshop graphic design software

Implementing visualization techniques in faculty research
The image of the map reflects the different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

Hate Nation

Although we take pride in being a developed nation, we still have a long way to go towards reducing organized hatred, hostility and violence against people who differ from “us” in race, color, ethnicity, nationality, religion, gender, sexual orientation or are designated as marginal within our society.

According to the Southern Poverty Law Center’s 2015 Intelligence Report, the number of hate groups active in the U.S rose from 784 in 2014 to 892 in 2015. The U.S. is home to the world’s most notorious hate group, the Ku Klux Klan, which had the largest share of U.S. hate groups that year (21.3 %). It was followed by the Black Separatists (20.2%), the Racist Skinheads (10.7%), the White Nationalists (10.7%) and the Neo-Nazis (10.5%). These 5 groups comprise 73% of the known hate groups in the U.S. Among the states, Texas reported the largest number, 84, 55 of which were KKK. California came second with 68 groups, mainly Black Separatists and Racist Skinheads. Florida ranked third, with 59, 22 of which were Black Separatist groups.

The following infographic shows the extent and distribution of known hate groups in the U.S.

Click through to see the enlarged image.

HateGroups_Infograph

Techniques Used
The above visualization includes 3 types of techniques:

Quantitative Analysis: A bar chart was used to visualize quantitative data on the number of known hate groups.

Statistical Analysis (GIS): Spatial analysis included 3 major techniques. The geocoding technique converted hate group locations to a point on the map, choropleth maps and classification methods were used to show the distribution of hate groups by state and to identify the correlation among race and the density of hate groups in each state.

Graphics: Graphics and images used in the infographics were edited using Photoshop graphic design software.

Implementing visualization techniques in faculty research
The image of the map reflects the different visualization techniques that might be used to effectively convey data or research conclusions to different types of audiences in various disciplines or industries. Visualizations can help identify existing or emerging trends, spot irregularities or obscure patterns, and even address or solve issues.

Ask us how to visualize your research
For help visualizing your own research findings or seeing if your research lends itself to similar techniques including data acquisition and pre-processing of both quantitative and qualitative data, contact Nandhini Gulasingam at mgulasin@depaul.edu.

Naming Variables in Stata (and other Statistical Packages)

Naming variables in Stata and other statistical packages is definitely a practice in balancing art and science.  Researchers and statisticians alike find themselves balancing their needs for the variable names to be informative, but not so informative that one accidentally mis-specifies variables and producing messy and problematic analysis.  J. Scott Long’s The Workflow of Data Analysis Using Stata details the three types of naming systems that people usually utilize when naming variables.

OLYMPUS DIGITAL CAMERA

Sequential names use a stub followed by sequential digits, like v7, v11, v013 or something more complicated like R0002203, R002205, etc.  The numbers might correspond to the order that the data were collected in or the questions were asked.  Because the names don’t have any meaning, it is easy to use the wrong variable in a set of analyses, or difficult to remember the name of the variable you need.  Because this risk if very real and wastes a lot of time when trying to figure out where the wheels came off of your analysis, some researchers refer to a printed list of variable names, descriptive statistics and variable labels, like what is produced by the command “codebook, compact”.

Source names are those that use the information about where the variable came from as part of the name.  You might see this in a survey where you have questions q1, q2, q3.  A question that comes in multiple parts might get named something like q4a, q4b, q4c.  This would Question 4, parts a, b, and c.  With source names, you might have variables that don’t fit into the scheme, because they might pertain to some aspect of the data collection, like demographic variables or information about the time and site of data collection.  If these kinds of variables are part of your dataset, consider how they will be named prior to including them in your dataset.  These types of names can be more useful than sequential names, but it can still be dodgy when you are looking at a complicated model with these types of names.

Mnemonic naming systems use abbreviations that convey content of the variable (e.g. female, educ, id, state, etc).  These can be much more useful because they provide clues as to what they pertain to.  While these names can be more useful, some consideration is necessary when planning names, because of limitations of statistical packages.  For example, Stata allows for names that are 32 characters long, but will truncate names when listing results.  The default in Stata tends to truncate variables like familyincome_1990, familyincome_2000, and familyincome_2010 as familyincome, familyincome, familyincome in analyses and results tables.  It is best to aim for variable names that are no more than 12 characters, so that if your statistical package does truncate variable names, you can still tell which variable is which.  This might look like: faminc1990, faminc2000, faminc2010.

Long also suggests that it might also be useful to include indicators about the structure of the variable in the name.  You might include b=binary, i=indicator, n=negatively coded scale, and p=positively coded scale.  This lets you know, without having to refer to a codebook what you are looking at when you see variable like bdepres_cesd, that this is a binary item indicating depression based on the CESD.

One final note, be careful with capitalization.  Statistical packages deal with capitalized letters in different ways.  In Stata, Educ, EDUC, and educ all appear to be different variables.  This might not be too much of a problem for you, but consider what happens when you convert from one file format (Stata) to another (like Excel), which may result in dropping extra information (like capitalized letters).  When this happens in Stata, if you have three variables like Educ, educ, and EDUC, the second and third variable names will get converted to something like varNUM, which can be confusing when you are trying to work with your data after a file conversion.