Stata’s datasignature command

As we have discussed before, Stata file management can be tricky.  There is the incessant and iterative updating of the files.  Different kinds of files do different things… data cleaning, data management, item creation, descriptive analyses, regression analyses.  And there then there is version management.

18019403462_9d462cf6a4_o

There is a handy little trick in Stata called datasignature.  Long story short, data signature protects the integrity of your data.  When the command is executed, Stata generates a signature string, which is based on 5 characteristics (checksums) that describe your dataset, including # of cases, characteristics regarding the names of variables, # of variables, # of values of variables.

The next time you load the dataset, you can use the datasignature confirm command, and Stata will report whether or not the data have changed since you’d last used the dataset.  If they haven’t changed, then Stata will report, “data unchanged since ________ (date last saved)”.  If your data have changed, the datasignature command will indicate the day of the last save.

Why might this be important?

This might be crucial for teams of researchers collaborating on a large analysis project.  Particularly if multiple people are working with the cleaning, management, and analysis files, and if all of those people don’t have similar levels of concern for hygienic data management.  It can become a problem if someone comes back to a dataset not realizing it has changed.  Datasignature can help eliminate this problem.

Advertisements

Author: Jessica Bishop-Royse

Jessica Bishop-Royse is the SSRC’s Senior Research Methodologist. Her areas of interest include: health disparities, demography, crime, methods, and statistics. She often finds herself navigating the fields of sociology, demography, epidemiology, medicine, public health, and policy. She was broadly trained in data collection, Stata, quantitative research methodology, as well as statistics. She has experience with multi-level analyses, survival analyses, and multivariate regression. Outside of the work context, Jessi is interested in writing, reading, travel, photography, and sport.

Leave a re/ply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s