Skip to main content

Research Data Management: Best Practices for Managing Your Data

Research data management planning, concepts, support, UC San Diego Research Data Curation Program

Choose Sustainable Formats and Metadata Standards

Sustainable data formats that will more likely be accessible in the future have the following attributes:

  • Open, documented standards
  • In widespread use (preferably non-proprietary)
  • Contain as much of the original information as possible (uncompressed)
  • Use standard encodings
  • Self-describing: contain metadata needed to interpret the content, context, and/or structure of the record

Describe Your Data Using Standards

Following a metadata standard promotes discoverability and use of your data. Metadata provides information about the content, provenance, quality, use and/or accessibility of data. See:

Using Excel

Best Practices:

  • Use in conjunction with a "Data Dictionary" (similar to that listed above) containing information about:
    • Variable name
    • Variable types
    • Codes and Ranges
    • Missing values
  • Place variable names in row 1
  • Always have a unique identifier per entity
  • Keep track of changes made to worksheet
  • Format columns to matchthe variable type (date, numeric, text, etc.)
  • Data entry guidelines:
    • Freeze column headings so they will not scroll of the screen
    • Enter string variables in a consistent case
    • Do not leave any blank rows in the spreadsheet
    • Do not include unessential text or fancy formatting in the spreadsheet
    • Get rid of formulas - copy the entire spreadsheet into a new sheet using "Values" option
    • Sort data with caution (always SAVE first) 
  • Verify data using double data entry
  • Save as .csv for forward compatibility and interoperability

Resources:

  • DataUp - An Excel add-in that will assist individuals in documenting and preparing Excel for archiving and sharing
  • Elliott, A C. (2006). Preparing data for analysis using Microsoft Excel. Journal of investigative medicine, 54(06), 334-341.

Create a Data Register

Create a text document or table that includes:

  • what data you're collecting
  • format(s)
  • naming convention
  • location you're storing the data
  • owner (who's collecting, creating, or responsible for the data)
  • access (who is allowed access)

Define Your Data Dictionary

Example Data Dictionary

Example from Hook, Les A., et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010

Establish a Descriptive File and Dataset Naming Convention

A consistent convention will help you easily identify your files and what they contain. Use abbreviated descriptive information such as

  • project
  • content or parameter
  • location, date and/or time (yyyymmdd for easy sorting; hhmmssTZD for time)
  • version number (establish numbering system for versions)

Use numbers, letters, dashes, underscores. Do not use spaces or special characters. Stay concise to be practical.

Document Your Workflow

Create a "readme.txt" file that details the steps you took to generate and process your data.

Use a processing and analysis tool that creates and retains a scripted program or structured work flow:

Effective Data Practices: References

  • Data Management 101 (DCXL project)
  • Best Practices for Preparing Environmental Data Sets to Share and Archive (pdf) by Hook et al, 2010.
  • DataOne Best Practices database.
  • UK Data Archive: how-to, resources on data management.
  • Some Simple Guidelines for Effective Data Management by Elizabeth T. Borer et al., Bulletin of the Ecological Society of America 90(2) 205-214, including:
    • store a copy of your original rough data as a read-only, making copies to use in analysis
    • provide descriptive filenames and designate the first row of tables as a header
    • organize records in rows, using column headings that will allow analysis within columns rather than across columns, example: SITE YEAR RAIN TEMP SPEC_NAME POP
    • set up your tables so that you do not have to add columns when adding data
    • use ASCII characters to minimize translation problems with software programs
    • your data tables should only contain data, comments should be in a read.me text file that accompanies the table
  • DataCite on why and how to cite data
  • Practical Data Managment  April 2014 webinar - Dr. Kristin Briney for ACRL Digital Curation Interest Group