Data Sources

CoDES houses a range of healthcare databases that can be accessed by CoDES members within certain restrictions laid out by licensing and data user agreements. For detail on data access procedures and fees, please contact the CoDES director, Dr. Almut Winterstein, at Available databases include:

IBM® Marketscan® Research Databases

The IBM® Marketscan® Research Databases contain individual-level, de-identified healthcare claims information from employers, healthplans, and Medicaid programs. CoDES has active licenses for the IBM® Marketscan® Commercial Database, IBM®Marketscan® Medicare Supplemental Database, and IBM® Marketscan® Health Risk Assessment Database. 

The IBM® Marketscan® Commerical database includes 2005-2020 health insurance claims for inpatient, outpatient, and outpatient pharmacy encounters, as well as enrollment data from large employers and health plans across the United States who provide healthcare coverage for their employees, their spouses, and dependents. The current dataset includes >192 million lives.

The IBM® Marketscan® Medicare Supplemental Database includes 2005-2020 enrollment records along with inpatient, outpatient, ancillary, and drug claims for >12.9 million retirees in the United States with Medicare supplemental coverage through privately-insured fee-for-service, point-of-service, or capitated health plans.

For grant applications and data inquiries, we have created an interactive data query that allows researchers to further explore populations within this data. This data can be queried using predefined characteristics including age, sex, length of continuous enrollment, calendar year and month, and Chronic Condition Warehouse.

The IBM® Marketscan® Health Risk Assessment (HRA) Database includes 2012-2018 self-reported biometric and health-related behavioral data obtained through surveys of employees of large US corporations and health plans. HRA is linked to medical, pharmacy, and enrollment data for these employees in the IBM® Marketscan® Commercial Database and used to examine the relationships between health behaviors/risk and health outcomes or medical expenditures. Linked data is available for about 5% of beneficiaries.

Medicaid Analytic eXtract (MAX) and T-MSIS Analytic Files (TAF)

MAX and TAF data contain claims for medical care and drug benefits received by beneficiaries with Medicaid insurance coverage, the state-run programs for low-income and categorically eligible individuals and families. CoDES has in-house MAX data for over >120 million beneficiaries residing in the 29 most populous states from 1999-2010 (AL, AR, CA, FL, GA, IA, ID, IL, IN, KS, KY, LA, MA, MN, MO, MS, NC, NE, NJ, NM,  NY, OH, SC, TN, TX, VA, WA, WI, WV) and national data (all 50 states plus the district of Columbia) from 2011-2014. The 29 states included in the 1999-2010 MAX data represent 85% of all Medicaid beneficiaries. National data for 2015-2016 are currently curated.

Medicaid data has been linked to birth certificates from the Florida Department of Health (1999-2014), Texas Department of State Health Services (1999-2012) and New Jersey Department of Health (1999-2010). The entire national Medicaid data set includes validated mother-infant linkages.

Medicare fee-for-service claims data

Medicare is a federal health insurance program that provides coverage to people aged 65 years or older and those with disabilities or end-stage renal disease. Annual Medicare enrollment has exceeded 50 million since 2012. Data include claims for inpatient, skilled care nursing facility, and hospice care (Part A) as well as outpatient care (Part B) and prescription drugs (Part D). CoDES center has in-house 5% national Medicare data for the years 2011 through 2015 plus 1 million beneficiaries in FL who were over sampled from individuals who reside in the UF Health catchment area, and 15% national Medicare beneficiaries  plus the entire state of Florida for 2016-2018, totaling >8 million lives.

Medicare data are linked to the Long Term Care Minimum Data Set (MDS) and the Health and Retirement Survey. MDS contains data on health status and resource utilization for residents of long term care nursing facilities. HRS includes survey data from persons 50 years and older with information on self-reported health and functional status, insurance, and income.

Enrollment tables for the claims data sets listed above by year can be found here.