What is data?
Data vs. Statistics
Data are raw ingredients from which statistics are created. Statistics are useful when you just need a few numbers to support an argument (ex. In 2003, 98.2% of American households had a television set--from Statistical Abstract of the United States). Statistics are usually presented in tables. Statistical analysis can be performed on data to show relationships among the variables collected. Through secondary data analysis, many different researchers can re-use the same data set for different purposes.
Aggregate/Macro Data vs. Microdata
Aggregate or Macro Data are higher-level data that have been compiled from smaller units of data. For example, the Census data that you find on AmericanFactfinder have been aggregated to preserve the confidentiality of individual respondents. Microdata contain individual cases, usually individual people, or in the case of Census data, individual households. The Integrated Public Use Microdata Sample (IPUMS) for the Census provides access to the actual survey data from the Census, but eliminates information that would identify individuals.
Data Sets, Studies, and Series
In data archives like ICPSR, a data set or study is made up of the raw data file and any related files, usually the codebook and setup files. The codebook is your guide to making sense of the raw data. For survey data, the codebook usually contains the actual questionnaire and the values for the responses to each question. The setup files help will not display properly.
ICPSR uses the term series to describe collections of studies that have been repeated over time. For example, the National Health Interview Survey is conducted annually. In the ICPSR archive, you will find a description of the series that provides an overview. You will also find individual descriptions of each study (i.e. National Health Interview Survey, 2004). The study number in ICPSR refers to the individual survey.
Types of Data
Cross-Sectional describes data that are only collected once.
Time Series study the same variable over time. The National Health Interview Survey is an example of time series data because the questions generally remain the same over time, but the individual respondents vary.
Longitudinal Studies describe surveys that are conducted repeatedly, in which the same group of respondents are surveyed each time. This allows for examining changes over the life course. The Project on Human Development in Chicago Neighborhoods (PHDCN) Series contains a longitudinal component that tracks changes in the lives of individuals over time through interviews.
(Originally from Sue Erickson at Vanderbilt University http://www.library.vanderbilt.edu/central/FindingData.htm)
CEIC Data contains economic, industrial and financial time-series data. Access limited to 5 simultaneous users. Try again later if refused.
Four databases are included in this package:
• China Premium Databases
The China Premium Database offers over 415,000 time-series records on macroeconomic, sector, industry and regional data dating back to 1949. The China Premium Database is available in both English and Chinese.
• Brazil Premium Databases
The Brazil Premium Database covers 13 macroeconomic and 13 industry growth sectors with unmatched coverage of the energy, biofuel, and automobile sectors. Regional data are also available for Brazil's 5 regions, all 27 states, and 6000 municipalities.
• Global Database
CEIC Data’s Global Database provides access to granular macroeconomic data covering both developed and emerging markets covering over 200 economies and is ideal for business students, professors and lecturers for macroeconomic research projects and course assignments.
• Daily Database
Provides series statistical data on China. As of November 2017, the EPS Data Platform includes 41 China databases sourced from industrial, regional and national organizations, covering various subjects/industries/fields and all regions of China. It offers data retrieval, processing, analysis, forecasting, visualization display and data export. EPS China Statistics has both English and Chinese versions, and contains over 1.2 million basic and combined statistical indicators in time series with a yearly increment of more than 30 million numeric data. URL to Chinese version: http://olap.epsnet.com.cn/index.html
Xi Chen
Schedule a virtual or in-person appointment:
https://ucsd.libcal.com/appointments/xichen
email: xichen031@ucsd.edu
phone:858-534-2894
office: Geisel West 2nd Floor