A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL

Tamar Sofer, Ruth Heller, Marina Bogomolov, Christy L Avery, Mariaelisa Graff, Kari E North, Alex P. Reiner, Timothy A. Thornton, Kenneth Rice, Yoav Benjamini, Cathy C. Laurie, Kathleen F. Kerr

Research output: Contribution to journalArticle

  • 13 Citations

Abstract

In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

LanguageEnglish (US)
Pages251-258
Number of pages8
JournalGenetic epidemiology
Volume41
Issue number3
DOIs
StatePublished - Apr 1 2017

Fingerprint

Genome-Wide Association Study
Hispanic Americans
Single Nucleotide Polymorphism
Health
Genetic Association Studies
Population
Cholesterol
Genome

Keywords

  • multiple testing
  • one-sided P-values
  • shared genetics

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. / Sofer, Tamar; Heller, Ruth; Bogomolov, Marina; Avery, Christy L; Graff, Mariaelisa; North, Kari E; Reiner, Alex P.; Thornton, Timothy A.; Rice, Kenneth; Benjamini, Yoav; Laurie, Cathy C.; Kerr, Kathleen F.

In: Genetic epidemiology, Vol. 41, No. 3, 01.04.2017, p. 251-258.

Research output: Contribution to journalArticle

Sofer, T, Heller, R, Bogomolov, M, Avery, CL, Graff, M, North, KE, Reiner, AP, Thornton, TA, Rice, K, Benjamini, Y, Laurie, CC & Kerr, KF 2017, 'A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL' Genetic epidemiology, vol. 41, no. 3, pp. 251-258. https://doi.org/10.1002/gepi.22029
Sofer, Tamar ; Heller, Ruth ; Bogomolov, Marina ; Avery, Christy L ; Graff, Mariaelisa ; North, Kari E ; Reiner, Alex P. ; Thornton, Timothy A. ; Rice, Kenneth ; Benjamini, Yoav ; Laurie, Cathy C. ; Kerr, Kathleen F. / A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. In: Genetic epidemiology. 2017 ; Vol. 41, No. 3. pp. 251-258.
@article{0138324106934a83a16514c5efcd8962,
title = "A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL",
abstract = "In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.",
keywords = "multiple testing, one-sided P-values, shared genetics",
author = "Tamar Sofer and Ruth Heller and Marina Bogomolov and Avery, {Christy L} and Mariaelisa Graff and North, {Kari E} and Reiner, {Alex P.} and Thornton, {Timothy A.} and Kenneth Rice and Yoav Benjamini and Laurie, {Cathy C.} and Kerr, {Kathleen F.}",
year = "2017",
month = "4",
day = "1",
doi = "10.1002/gepi.22029",
language = "English (US)",
volume = "41",
pages = "251--258",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL

AU - Sofer, Tamar

AU - Heller, Ruth

AU - Bogomolov, Marina

AU - Avery, Christy L

AU - Graff, Mariaelisa

AU - North, Kari E

AU - Reiner, Alex P.

AU - Thornton, Timothy A.

AU - Rice, Kenneth

AU - Benjamini, Yoav

AU - Laurie, Cathy C.

AU - Kerr, Kathleen F.

PY - 2017/4/1

Y1 - 2017/4/1

N2 - In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

AB - In genome-wide association studies (GWAS), “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values < 5 × 10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values < 6.6 × 10-5 (89 regions), we generalized SNPs from 27 regions.

KW - multiple testing

KW - one-sided P-values

KW - shared genetics

UR - http://www.scopus.com/inward/record.url?scp=85010806721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010806721&partnerID=8YFLogxK

U2 - 10.1002/gepi.22029

DO - 10.1002/gepi.22029

M3 - Article

VL - 41

SP - 251

EP - 258

JO - Genetic Epidemiology

T2 - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -