Genetic studies have been conducted predominantly in cohorts of individuals of European ancestry. By 2010, there were approximately ten times as many published genome wide association studies (GWAS) in people of European ancestry than studies in people of all other ancestries combined. This research disparity has led to an uneven understanding in the genetic basis underlying disease in Europeans and non-Europeans. 23andMe's web-based, large scale research model is ideal for scaling genetics research within non-European populations and bring more parity in genetic research. With our previous SBIR grant (2R44HG006981) we expanded 23andMe's tools to mine the genetic and phenotypic information in the database, largely through improved survey data collection, as well as better management and computational tools. This proposal will extend 23andMe's infrastructure for association studies to leverage the variation found in admixed individuals - individuals with ancestry from multiple continents. We will develop a pipeline for large-scale, computationally efficient, admixture mapping that allows us to interrogate the variation in admixed genomes. Admixture mapping looks for regions of the genome that have an enrichment of one ancestral background in individuals with a disease, indicative of the presence of risk variants that differ i frequency among the ancestral populations. 23andMe's genotype-phenotype database derived from over 800,000 research consented customers includes data from over 68,000 Latinos and 34,000 African Americans. In our proposal we aim to harness genetic admixture to drive the discovery of disease variants found in non-Europeans, especially variants of African and Native American origin found in the admixed genomes of African Americans and Latinos. Unlike previous admixture mapping studies, which typically relied on small sets of ancestry informative markers, we will leverage 23andMe's fine-scale local ancestry estimates and growing genotype-phenotype database for admixture mapping in thousands of individuals across dozens of phenotypes. We will implement an admixture mapping pipeline that leverages our existing infrastructure for fine-scale local ancestry inference, and validate the pipeline through replicatin of a previous hit (Aim #1). We will apply our pipeline to several diseases that have been either previously targeted by admixture mapping studies or have been identified by the CDC as contributing to growing health disparities among groups, which will require determining which combination of cohorts and ancestries will give rise to greatest power in finding genetic associations (Aim #2). Lastly, we follow-up our top novel target to both validate findings and fine-map (Aim #3). This proposal will expand 23andMe's research pipeline to include admixture mapping, and improve the understanding of global genetic variation underlying diseases and traits. Key commercial outcomes of the research are novel genetic targets for internal and external therapeutic development. The long-term aim is to improve understanding of disease in minority populations, which we hope may eventually lead to improved treatments of disease in these historically medically understudied groups.
Public Health Relevance Statement: Public Health Relevance: Genome wide association studies have yielded many discoveries of genes associated with disease, but largely in European populations. This proposal aims to discover new associations of genetic variants from globally diverse backgrounds by developing methods that take advantage of information in genetically admixed 23andMe research participants. This study will enable 23andMe to improve the knowledge of disease variants found in non-Europeans, especially variants of African and Native American origin found today in African Americans and Latinos.
Project Terms: 6q21; Admixture; African American; American; Asthma; base; Biological; Cataloging; Catalogs; Centers for Disease Control and Prevention (U.S.); cohort; computerized tools; Consent; Coronary heart disease; Data; Data Analyses; Data Collection; Data Set; database of Genotypes and Phenotypes; Databases; Diabetes Mellitus; Disease; Ethnicity aspects; European; follow-up; Frequencies (time pattern); gene discovery; Genes; Genetic; genetic association; Genetic Markers; Genetic Research; Genetic study; genetic variant; Genome; genome annotation; genome wide association study; Grant; Haplotypes; health disparity; Hypertension; improved; Individual; Knowledge; Latino; Lead; Literature; malignant breast neoplasm; Malignant neoplasm of prostate; Maps; Methods; Mining; Minority; Modeling; Multiple Sclerosis; Native Americans; novel; Obesity; Online Systems; Outcomes Research; parity; Participant; Phenotype; Population; Positioning Attribute; Premature Birth; public health relevance; Publishing; Research; Research Infrastructure; Risk; risk variant; Small Business Innovation Research Grant; stroke; Surveys; therapeutic development; Time; tool; trait; Validation; Variant; Variation (Genetics); Work