Study Data | National Institute of Environmental Health Sciences
Source: https://www.niehs.nih.gov/research/atniehs/labs/crb/studies/pegs/about/data
Archived: 2026-04-23 17:14
Study Data | National Institute of Environmental Health Sciences
Skip Navigation
Study Data
PEGS: Personalized Environment and Genes Study
Close the left navigation
Add
Home
About PEGS
Data Snapshots
Explore PEGS with i2b2
Frequently Asked Questions
Results Explorer
Study Data
Study Design
Study Participants
Scientific Collaborations
Collaboration Guidelines
Collaborative Studies
Policies and Process
Submit a Proposal
Leadership Team & Advisory Groups
News & Updates
Publications
Contact Us
PEGS Data Freezes
The PEGS data are stored securely in a single centralized, shared repository to ensure consistent, reproducible, and comparable analyses. PEGS comprises a compatible, multi-dimensional collection of datasets in consistent and programmatically extractable formats, as shown in the figure on the right, and Data Components below. PEGS data are updated on a quarterly basis with additional participants, new variables, participant updates, and any additional data components. We are continually building analysis pipelines and workflows to enable efficient, reproducible, insightful, and collaborative research using the PEGS data.
Data Components
Data components available to researchers from the PEGS cohort are listed with their description and sample size (the number of participants). The latest versions of the administered participant surveys are also provided.
Category
Component
Description
Documents
Number of Participants
Survey Data
Demographic and Administrative Data
Demographics, consent, address and administrative data for all participants
19,445
Health & Exposure Survey
Demographics, health, family history of disease, environmental exposures, socioeconomic status and lifestyle
Health & Exposure Survey
(338KB)
9,449
External Exposome Survey (Exposome A)
Residential and occupational environmental exposures
External Exposome Survey
(7MB)
3,618
Internal Exposome Survey (Exposome B)
Medication use, physical activity, stress, sleep, diet, genetics and reproductive history
Internal Exposome Survey
(11MB)
3,071
Diabetes Screener Survey
Diabetes screener administered to participants with self-reported diabetes
Diabetes Screener Survey
(69KB)
227
Eczema Screener Survey
Eczema screener administered to participants with self-reported eczema
Eczema Screener Survey
(92KB)
329
Right-not-to-know Main Survey
Right-not-to-know Survey administered for incidental findings reports
231
Right-not-to-know Cognitive Interview Survey
Right-not-to-know Cognitive Interview administered to assess awareness of incidental findings reports
Right-not-to-know Cognitive Interview Survey
(1MB)
12
Medication Data
Anatomical Therapeutic Chemical (ATC) Codes
ATC codes for self-reported free-text medication names from the Internal Exposome Survey (Exposome B) as per the World Health Organization's (WHO's) ATC classification system
2,263
Geospatial Data
Geocodes (GIS)
Geocoded participant addresses from five study events with mapping coordinates
18,462
Hazards Data
Exposure estimates and proximity measures calculated using geospatial linkages from the following databases - Atmospheric Composition Analysis Group (ACAG), Toxics Release Inventory (TRI), Center for Air, Climate, and Energy Solutions (CACES), North Carolina Department of Environmental Quality (NCDEQ), Department of Transportation (DOT), Federal Aviation Administration (FAA), Federal Communications Commission (FCC) and the Nuclear Regulatory Commission (NRC)
18,462
MERRA-2 Data (Earthdata)
Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project containing consistent estimates of climate and environmental metrics from a range of satellite-based environmental observations
17,273
Social Vulnerability Index (SVI) Data
Geospatial data linkages for CDC/ATSDR Social Vulnerability Index containing summaries of social determinants of health at the census tract level
17,273
Genomic Data
Candidate Gene/SNP Data
Candidate SNP data for a subset of participants for specific research goals
12,316
Single Nucleotide Variants (SNVs)
SNV and small indel genotypes derived from the whole-genome sequencing (WGS) data in plink's .bed/.bim/.fam format
4,737
Structural Variants
Structural variant calls generated from the WGS data in .vcf format consisting of large deletions, duplications, and inversions
4,737
Human Leukocyte Antigens (HLA) Genotypes
HLA genotypes identified from the WGS data for 20 HLA genes with up to six digits of specificity
4,737
Telomeric Content
Aggregate telomeric content estimated from WGS reads reported as telomeric reads per GC content-matched million reads
4,737
Local and Global Ancestry Estimations
Inferred local ancestry per chromosome after haplotype phasing and global estimates of percent ancestry for each participant
4,730
Methylation Data
Genome-wide methylation profiling data using the Infinium MethylationEPIC v1.0 BeadChip Kit targeting 866,297 CpG sites
4,724
Survey Summary
Categories of survey questions administered to the participants in the Health & Exposure Survey are provided.
Health & Exposure Survey
About Your Family's Health
Diabetes and Endocrine
Neurologic
About Your General Health
Digestive
Occupation
About Your Home Life
Exposures
Renal
About Your Mood
Fatigue
Reproductive (Females Only)
Bones, Joints, and Muscles
Hematological
Reproductive (Males Only)
Cancer
Immune
Respiratory
Cardiovascular
Lifestyle
Skin, Eyes, and Hair
Categories of survey questions administered to the participants in the External Exposome Survey (Exposome Survey - Part A) are provided.
External Exposome (Exposome A)
Characteristics of Current and Past Residences:
• Agricultural Property Use
• Garage and Basement
• Heating and Cooling
• Pesticides and Insecticides
• Pets
• Surrounding Area
• Walls and Flooring
• Water and Dampness
Chemical and Metal Exposures at Work
Hobby Exposures
Ultraviolet Light Exposures
Workplace Characteristics
Categories of survey questions administered to the participants in the Internal Exposome Survey (Exposome Survey - Part B) are provided.
Internal Exposome (Exposome B)
Chemotherapy/Radiation Therapy
Physical Activity
Dietary Behavior
Reproductive History (Females Only)
Dietary Intake
Sleep
Genetic History
Stress
Infectious Disease
Vitamins, Minerals, and Other Supplement Use
Medications
Twin/Triplet Siblings and Birth Order
Other
Geospatial Data Summary
Source
Description
Examples
Geocodes (GIS)
Geocoded data from multiple participant-provided addresses from time of: initial enrollment, completion of the Health and Exposure Survey, completion of the External Exposome Survey and the longest-lived childhood address and the longest-lived adult address from the External Exposome.
Geographic coordinates (latitude and longitude) from multiple participant-provided addresses.
Hazards
Exposure estimates computed from Department of Transportation (DOT) data.
Information from train tracks, rail depots and roadways, such as total major roadway length, distance to nearest rail depot, etc.
Hazards
Exposure estimates computed from Federal Aviation Administration (FAA) data.
Information from aircraft departure and arrival sites - e.g., distance to nearest airport.
Hazards
Exposure estimates computed from Federal Communications Commission (FCC) data.
Information from cellular network towers - e.g., nearest cell tower.
Hazards
Exposure estimates computed from North Carolina Department of Environmental Quality (NCDEQ).
Distance to multi-pollutant point sources such as swine CAFOs, hazardous waste site, hazardous spill site, EPA superfund site, wastewater treatment plant release site, etc.
Hazards
Exposure estimates computed from Nuclear Regulatory Commission (NRC) data.
Distance to nuclear power station.
Hazards
Exposure estimates computed from Atmospheric Composition and Analysis Group (ACAG) data.
Particulate matter concentrations - PM2.5 total, PM2.5 sulfate, PM2.5 black carbon, etc.
Hazards
Exposure estimates computed from Center for Air, Climate, and Energy Solutions (CACES) data.
Concentrations for multiple pollutants such as carbon monoxide, nitrogen dioxide, ozone concentration, etc.
Hazards
Exposure estimates computed from Toxics Release Inventory (TRI) data.
Emissions for chemicals of interest such as benzene, ethylbenzene, xylene, toluene, etc.
MERRA-2 data (Earthdata)
Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project to assimilate a range of satellite-based environmental observations into a consistent estimate of climate and environmental metrics.
Particulate, gas, meteorological, and health-relevant exposure indicators such as - dust sedimentation, organic carbon emission bin, SO2 biomass burning emissions, sea-level pressure, etc.
Social Vulnerability Index (SVI)
Geospatial data linkages for CDC/ATSDR Social Vulnerability Index, designed to consistently quantify multiple social determinants of health across the United States over time.
Consists of summaries of social determinants of health at the census tract level including an overall index, four component indexes (socioeconomic status, household characteristics, racial and ethnic minority status, and housing type/transportation), and source variables used to compute each index component (e.g., poverty, education, overcrowding, access to vehicle, etc.)
All data on this website are reported from PEGS Data Freeze 3.1 created on 6/27/2023
.
Back
to Top
Last Reviewed: February 18, 2026
Skip Navigation
Study Data
PEGS: Personalized Environment and Genes Study
Close the left navigation
Add
Home
About PEGS
Data Snapshots
Explore PEGS with i2b2
Frequently Asked Questions
Results Explorer
Study Data
Study Design
Study Participants
Scientific Collaborations
Collaboration Guidelines
Collaborative Studies
Policies and Process
Submit a Proposal
Leadership Team & Advisory Groups
News & Updates
Publications
Contact Us
PEGS Data Freezes
The PEGS data are stored securely in a single centralized, shared repository to ensure consistent, reproducible, and comparable analyses. PEGS comprises a compatible, multi-dimensional collection of datasets in consistent and programmatically extractable formats, as shown in the figure on the right, and Data Components below. PEGS data are updated on a quarterly basis with additional participants, new variables, participant updates, and any additional data components. We are continually building analysis pipelines and workflows to enable efficient, reproducible, insightful, and collaborative research using the PEGS data.
Data Components
Data components available to researchers from the PEGS cohort are listed with their description and sample size (the number of participants). The latest versions of the administered participant surveys are also provided.
Category
Component
Description
Documents
Number of Participants
Survey Data
Demographic and Administrative Data
Demographics, consent, address and administrative data for all participants
19,445
Health & Exposure Survey
Demographics, health, family history of disease, environmental exposures, socioeconomic status and lifestyle
Health & Exposure Survey
(338KB)
9,449
External Exposome Survey (Exposome A)
Residential and occupational environmental exposures
External Exposome Survey
(7MB)
3,618
Internal Exposome Survey (Exposome B)
Medication use, physical activity, stress, sleep, diet, genetics and reproductive history
Internal Exposome Survey
(11MB)
3,071
Diabetes Screener Survey
Diabetes screener administered to participants with self-reported diabetes
Diabetes Screener Survey
(69KB)
227
Eczema Screener Survey
Eczema screener administered to participants with self-reported eczema
Eczema Screener Survey
(92KB)
329
Right-not-to-know Main Survey
Right-not-to-know Survey administered for incidental findings reports
231
Right-not-to-know Cognitive Interview Survey
Right-not-to-know Cognitive Interview administered to assess awareness of incidental findings reports
Right-not-to-know Cognitive Interview Survey
(1MB)
12
Medication Data
Anatomical Therapeutic Chemical (ATC) Codes
ATC codes for self-reported free-text medication names from the Internal Exposome Survey (Exposome B) as per the World Health Organization's (WHO's) ATC classification system
2,263
Geospatial Data
Geocodes (GIS)
Geocoded participant addresses from five study events with mapping coordinates
18,462
Hazards Data
Exposure estimates and proximity measures calculated using geospatial linkages from the following databases - Atmospheric Composition Analysis Group (ACAG), Toxics Release Inventory (TRI), Center for Air, Climate, and Energy Solutions (CACES), North Carolina Department of Environmental Quality (NCDEQ), Department of Transportation (DOT), Federal Aviation Administration (FAA), Federal Communications Commission (FCC) and the Nuclear Regulatory Commission (NRC)
18,462
MERRA-2 Data (Earthdata)
Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project containing consistent estimates of climate and environmental metrics from a range of satellite-based environmental observations
17,273
Social Vulnerability Index (SVI) Data
Geospatial data linkages for CDC/ATSDR Social Vulnerability Index containing summaries of social determinants of health at the census tract level
17,273
Genomic Data
Candidate Gene/SNP Data
Candidate SNP data for a subset of participants for specific research goals
12,316
Single Nucleotide Variants (SNVs)
SNV and small indel genotypes derived from the whole-genome sequencing (WGS) data in plink's .bed/.bim/.fam format
4,737
Structural Variants
Structural variant calls generated from the WGS data in .vcf format consisting of large deletions, duplications, and inversions
4,737
Human Leukocyte Antigens (HLA) Genotypes
HLA genotypes identified from the WGS data for 20 HLA genes with up to six digits of specificity
4,737
Telomeric Content
Aggregate telomeric content estimated from WGS reads reported as telomeric reads per GC content-matched million reads
4,737
Local and Global Ancestry Estimations
Inferred local ancestry per chromosome after haplotype phasing and global estimates of percent ancestry for each participant
4,730
Methylation Data
Genome-wide methylation profiling data using the Infinium MethylationEPIC v1.0 BeadChip Kit targeting 866,297 CpG sites
4,724
Survey Summary
Categories of survey questions administered to the participants in the Health & Exposure Survey are provided.
Health & Exposure Survey
About Your Family's Health
Diabetes and Endocrine
Neurologic
About Your General Health
Digestive
Occupation
About Your Home Life
Exposures
Renal
About Your Mood
Fatigue
Reproductive (Females Only)
Bones, Joints, and Muscles
Hematological
Reproductive (Males Only)
Cancer
Immune
Respiratory
Cardiovascular
Lifestyle
Skin, Eyes, and Hair
Categories of survey questions administered to the participants in the External Exposome Survey (Exposome Survey - Part A) are provided.
External Exposome (Exposome A)
Characteristics of Current and Past Residences:
• Agricultural Property Use
• Garage and Basement
• Heating and Cooling
• Pesticides and Insecticides
• Pets
• Surrounding Area
• Walls and Flooring
• Water and Dampness
Chemical and Metal Exposures at Work
Hobby Exposures
Ultraviolet Light Exposures
Workplace Characteristics
Categories of survey questions administered to the participants in the Internal Exposome Survey (Exposome Survey - Part B) are provided.
Internal Exposome (Exposome B)
Chemotherapy/Radiation Therapy
Physical Activity
Dietary Behavior
Reproductive History (Females Only)
Dietary Intake
Sleep
Genetic History
Stress
Infectious Disease
Vitamins, Minerals, and Other Supplement Use
Medications
Twin/Triplet Siblings and Birth Order
Other
Geospatial Data Summary
Source
Description
Examples
Geocodes (GIS)
Geocoded data from multiple participant-provided addresses from time of: initial enrollment, completion of the Health and Exposure Survey, completion of the External Exposome Survey and the longest-lived childhood address and the longest-lived adult address from the External Exposome.
Geographic coordinates (latitude and longitude) from multiple participant-provided addresses.
Hazards
Exposure estimates computed from Department of Transportation (DOT) data.
Information from train tracks, rail depots and roadways, such as total major roadway length, distance to nearest rail depot, etc.
Hazards
Exposure estimates computed from Federal Aviation Administration (FAA) data.
Information from aircraft departure and arrival sites - e.g., distance to nearest airport.
Hazards
Exposure estimates computed from Federal Communications Commission (FCC) data.
Information from cellular network towers - e.g., nearest cell tower.
Hazards
Exposure estimates computed from North Carolina Department of Environmental Quality (NCDEQ).
Distance to multi-pollutant point sources such as swine CAFOs, hazardous waste site, hazardous spill site, EPA superfund site, wastewater treatment plant release site, etc.
Hazards
Exposure estimates computed from Nuclear Regulatory Commission (NRC) data.
Distance to nuclear power station.
Hazards
Exposure estimates computed from Atmospheric Composition and Analysis Group (ACAG) data.
Particulate matter concentrations - PM2.5 total, PM2.5 sulfate, PM2.5 black carbon, etc.
Hazards
Exposure estimates computed from Center for Air, Climate, and Energy Solutions (CACES) data.
Concentrations for multiple pollutants such as carbon monoxide, nitrogen dioxide, ozone concentration, etc.
Hazards
Exposure estimates computed from Toxics Release Inventory (TRI) data.
Emissions for chemicals of interest such as benzene, ethylbenzene, xylene, toluene, etc.
MERRA-2 data (Earthdata)
Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project to assimilate a range of satellite-based environmental observations into a consistent estimate of climate and environmental metrics.
Particulate, gas, meteorological, and health-relevant exposure indicators such as - dust sedimentation, organic carbon emission bin, SO2 biomass burning emissions, sea-level pressure, etc.
Social Vulnerability Index (SVI)
Geospatial data linkages for CDC/ATSDR Social Vulnerability Index, designed to consistently quantify multiple social determinants of health across the United States over time.
Consists of summaries of social determinants of health at the census tract level including an overall index, four component indexes (socioeconomic status, household characteristics, racial and ethnic minority status, and housing type/transportation), and source variables used to compute each index component (e.g., poverty, education, overcrowding, access to vehicle, etc.)
All data on this website are reported from PEGS Data Freeze 3.1 created on 6/27/2023
.
Back
to Top
Last Reviewed: February 18, 2026