Stata’s datasignature command

As we have discussed before, Stata file management can be tricky.  There is the incessant and iterative updating of the files.  Different kinds of files do different things… data cleaning, data management, item creation, descriptive analyses, regression analyses.  And there then there is version management.


There is a handy little trick in Stata called datasignature.  Long story short, data signature protects the integrity of your data.  When the command is executed, Stata generates a signature string, which is based on 5 characteristics (checksums) that describe your dataset, including # of cases, characteristics regarding the names of variables, # of variables, # of values of variables.

The next time you load the dataset, you can use the datasignature confirm command, and Stata will report whether or not the data have changed since you’d last used the dataset.  If they haven’t changed, then Stata will report, “data unchanged since ________ (date last saved)”.  If your data have changed, the datasignature command will indicate the day of the last save.

Why might this be important?

This might be crucial for teams of researchers collaborating on a large analysis project.  Particularly if multiple people are working with the cleaning, management, and analysis files, and if all of those people don’t have similar levels of concern for hygienic data management.  It can become a problem if someone comes back to a dataset not realizing it has changed.  Datasignature can help eliminate this problem.


Author: Jessica Bishop-Royse

Jessica Bishop-Royse is the SSRC’s Senior Research Methodologist. Her areas of interest include: health disparities, demography, crime, methods, and statistics. She often finds herself navigating the fields of sociology, demography, epidemiology, medicine, public health, and policy. She was broadly trained in data collection, Stata, quantitative research methodology, as well as statistics. She has experience with multi-level analyses, survival analyses, and multivariate regression. Outside of the work context, Jessi is interested in writing, reading, travel, photography, and sport.

Leave a re/ply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s