A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL

Tamar Sofer, Ruth Heller, Marina Bogomolov, Christy L. Avery, Mariaelisa Graff, Kari E. North, Alex P. Reiner, Timothy A. Thornton, Kenneth Rice, Yoav Benjamini, Cathy C. Laurie, Kathleen F. Kerr

Research output: Research - peer-reviewArticle

  • 8 Citations

Abstract

In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

LanguageEnglish (US)
Pages251-258
Number of pages8
JournalGenetic Epidemiology
Volume41
Issue number3
DOIs
StatePublished - Apr 1 2017

Fingerprint

Genome-Wide Association Study
Hispanic Americans
Single Nucleotide Polymorphism
Health
Population
Genetic Association Studies
Cholesterol
Genome
Direction compound
Power (Psychology)

Keywords

  • multiple testing
  • one-sided P-values
  • shared genetics

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. / Sofer, Tamar; Heller, Ruth; Bogomolov, Marina; Avery, Christy L.; Graff, Mariaelisa; North, Kari E.; Reiner, Alex P.; Thornton, Timothy A.; Rice, Kenneth; Benjamini, Yoav; Laurie, Cathy C.; Kerr, Kathleen F.

In: Genetic Epidemiology, Vol. 41, No. 3, 01.04.2017, p. 251-258.

Research output: Research - peer-reviewArticle

Sofer, T, Heller, R, Bogomolov, M, Avery, CL, Graff, M, North, KE, Reiner, AP, Thornton, TA, Rice, K, Benjamini, Y, Laurie, CC & Kerr, KF 2017, 'A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL' Genetic Epidemiology, vol 41, no. 3, pp. 251-258. DOI: 10.1002/gepi.22029
Sofer, Tamar ; Heller, Ruth ; Bogomolov, Marina ; Avery, Christy L. ; Graff, Mariaelisa ; North, Kari E. ; Reiner, Alex P. ; Thornton, Timothy A. ; Rice, Kenneth ; Benjamini, Yoav ; Laurie, Cathy C. ; Kerr, Kathleen F./ A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. In: Genetic Epidemiology. 2017 ; Vol. 41, No. 3. pp. 251-258
@article{0138324106934a83a16514c5efcd8962,
title = "A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL",
abstract = "In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.",
keywords = "multiple testing, one-sided P-values, shared genetics",
author = "Tamar Sofer and Ruth Heller and Marina Bogomolov and Avery, {Christy L.} and Mariaelisa Graff and North, {Kari E.} and Reiner, {Alex P.} and Thornton, {Timothy A.} and Kenneth Rice and Yoav Benjamini and Laurie, {Cathy C.} and Kerr, {Kathleen F.}",
year = "2017",
month = "4",
doi = "10.1002/gepi.22029",
volume = "41",
pages = "251--258",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL

AU - Sofer,Tamar

AU - Heller,Ruth

AU - Bogomolov,Marina

AU - Avery,Christy L.

AU - Graff,Mariaelisa

AU - North,Kari E.

AU - Reiner,Alex P.

AU - Thornton,Timothy A.

AU - Rice,Kenneth

AU - Benjamini,Yoav

AU - Laurie,Cathy C.

AU - Kerr,Kathleen F.

PY - 2017/4/1

Y1 - 2017/4/1

N2 - In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

AB - In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

KW - multiple testing

KW - one-sided P-values

KW - shared genetics

UR - http://www.scopus.com/inward/record.url?scp=85010806721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010806721&partnerID=8YFLogxK

U2 - 10.1002/gepi.22029

DO - 10.1002/gepi.22029

M3 - Article

VL - 41

SP - 251

EP - 258

JO - Genetic Epidemiology

T2 - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -