A small-sample multivariate kernel machine test for microbiome association studies

Xiang Zhan, Xingwei Tong, Ni Zhao, Arnab Maity, Michael C. Wu, Jun Chen

Research output: Contribution to journalArticle

  • 2 Citations

Abstract

High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.

LanguageEnglish (US)
Pages210-220
Number of pages11
JournalGenetic Epidemiology
Volume41
Issue number3
DOIs
StatePublished - Apr 1 2017

Fingerprint

Microbiota
Sample Size
Social Adjustment
Language

Keywords

  • Bray-Curtis
  • kernel association test
  • multivariate outcomes
  • small sample
  • UniFrac

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

A small-sample multivariate kernel machine test for microbiome association studies. / Zhan, Xiang; Tong, Xingwei; Zhao, Ni; Maity, Arnab; Wu, Michael C.; Chen, Jun.

In: Genetic Epidemiology, Vol. 41, No. 3, 01.04.2017, p. 210-220.

Research output: Contribution to journalArticle

Zhan, X, Tong, X, Zhao, N, Maity, A, Wu, MC & Chen, J 2017, 'A small-sample multivariate kernel machine test for microbiome association studies' Genetic Epidemiology, vol. 41, no. 3, pp. 210-220. DOI: 10.1002/gepi.22030
Zhan X, Tong X, Zhao N, Maity A, Wu MC, Chen J. A small-sample multivariate kernel machine test for microbiome association studies. Genetic Epidemiology. 2017 Apr 1;41(3):210-220. Available from, DOI: 10.1002/gepi.22030
Zhan, Xiang ; Tong, Xingwei ; Zhao, Ni ; Maity, Arnab ; Wu, Michael C. ; Chen, Jun. / A small-sample multivariate kernel machine test for microbiome association studies. In: Genetic Epidemiology. 2017 ; Vol. 41, No. 3. pp. 210-220
@article{59e8d22bae80478bb8f33d5e9851d141,
title = "A small-sample multivariate kernel machine test for microbiome association studies",
abstract = "High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.",
keywords = "Bray-Curtis, kernel association test, multivariate outcomes, small sample, UniFrac",
author = "Xiang Zhan and Xingwei Tong and Ni Zhao and Arnab Maity and Wu, {Michael C.} and Jun Chen",
year = "2017",
month = "4",
day = "1",
doi = "10.1002/gepi.22030",
language = "English (US)",
volume = "41",
pages = "210--220",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - A small-sample multivariate kernel machine test for microbiome association studies

AU - Zhan,Xiang

AU - Tong,Xingwei

AU - Zhao,Ni

AU - Maity,Arnab

AU - Wu,Michael C.

AU - Chen,Jun

PY - 2017/4/1

Y1 - 2017/4/1

N2 - High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.

AB - High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.

KW - Bray-Curtis

KW - kernel association test

KW - multivariate outcomes

KW - small sample

KW - UniFrac

UR - http://www.scopus.com/inward/record.url?scp=85014518911&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014518911&partnerID=8YFLogxK

U2 - 10.1002/gepi.22030

DO - 10.1002/gepi.22030

M3 - Article

VL - 41

SP - 210

EP - 220

JO - Genetic Epidemiology

T2 - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -