A statistical framework for pathway and gene identification from integrative analysis

Quefeng Li, Menggang Yu, Sijian Wang

Research output: Contribution to journalArticle

Abstract

In the era of big data, integrative analyses that pool data from different sources are now extensively conducted in order to improve performance. Among many interesting applications, genomics research is an area where integrative methods become popular tools to identify prognostic biomarkers for various diseases. In this paper, we propose such a framework for pathway and gene identification. Our method employs a hierarchical decomposition on genes’ effects followed by a proper regularization to identify important pathways and genes across multiple studies. Asymptotic theories are provided to show that our method is both pathway and gene selection consistent. More importantly, we explicitly show that pathway selection consistency needs milder statistical conditions than gene selection consistency, as it would allow false positives and negatives at the gene selection level. Finite-sample performance of our method is shown to be superior than other ad hoc methods in various simulation studies. We further apply our method to analyze five cardiovascular disease studies. Our method is intrinsically a general method on group-wise and element-wise selections from integrative analysis, which can have other applications beyond genomic research.

LanguageEnglish (US)
Pages1-17
Number of pages17
JournalJournal of Multivariate Analysis
Volume156
DOIs
StatePublished - Apr 1 2017

Fingerprint

Pathway
Genes
Gene
Gene Selection
Genomics
Biomarkers
Framework
Asymptotic Theory
False Positive
Decomposition
Regularization
Simulation Study
Decompose

Keywords

  • Gene and pathway
  • High dimensional analysis
  • Integrative analysis
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Numerical Analysis
  • Statistics, Probability and Uncertainty

Cite this

A statistical framework for pathway and gene identification from integrative analysis. / Li, Quefeng; Yu, Menggang; Wang, Sijian.

In: Journal of Multivariate Analysis, Vol. 156, 01.04.2017, p. 1-17.

Research output: Contribution to journalArticle

@article{1c284cf23f68403e820d82c4e15ba40d,
title = "A statistical framework for pathway and gene identification from integrative analysis",
abstract = "In the era of big data, integrative analyses that pool data from different sources are now extensively conducted in order to improve performance. Among many interesting applications, genomics research is an area where integrative methods become popular tools to identify prognostic biomarkers for various diseases. In this paper, we propose such a framework for pathway and gene identification. Our method employs a hierarchical decomposition on genes’ effects followed by a proper regularization to identify important pathways and genes across multiple studies. Asymptotic theories are provided to show that our method is both pathway and gene selection consistent. More importantly, we explicitly show that pathway selection consistency needs milder statistical conditions than gene selection consistency, as it would allow false positives and negatives at the gene selection level. Finite-sample performance of our method is shown to be superior than other ad hoc methods in various simulation studies. We further apply our method to analyze five cardiovascular disease studies. Our method is intrinsically a general method on group-wise and element-wise selections from integrative analysis, which can have other applications beyond genomic research.",
keywords = "Gene and pathway, High dimensional analysis, Integrative analysis, Variable selection",
author = "Quefeng Li and Menggang Yu and Sijian Wang",
year = "2017",
month = "4",
day = "1",
doi = "10.1016/j.jmva.2016.12.005",
language = "English (US)",
volume = "156",
pages = "1--17",
journal = "Journal of Multivariate Analysis",
issn = "0047-259X",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - A statistical framework for pathway and gene identification from integrative analysis

AU - Li,Quefeng

AU - Yu,Menggang

AU - Wang,Sijian

PY - 2017/4/1

Y1 - 2017/4/1

N2 - In the era of big data, integrative analyses that pool data from different sources are now extensively conducted in order to improve performance. Among many interesting applications, genomics research is an area where integrative methods become popular tools to identify prognostic biomarkers for various diseases. In this paper, we propose such a framework for pathway and gene identification. Our method employs a hierarchical decomposition on genes’ effects followed by a proper regularization to identify important pathways and genes across multiple studies. Asymptotic theories are provided to show that our method is both pathway and gene selection consistent. More importantly, we explicitly show that pathway selection consistency needs milder statistical conditions than gene selection consistency, as it would allow false positives and negatives at the gene selection level. Finite-sample performance of our method is shown to be superior than other ad hoc methods in various simulation studies. We further apply our method to analyze five cardiovascular disease studies. Our method is intrinsically a general method on group-wise and element-wise selections from integrative analysis, which can have other applications beyond genomic research.

AB - In the era of big data, integrative analyses that pool data from different sources are now extensively conducted in order to improve performance. Among many interesting applications, genomics research is an area where integrative methods become popular tools to identify prognostic biomarkers for various diseases. In this paper, we propose such a framework for pathway and gene identification. Our method employs a hierarchical decomposition on genes’ effects followed by a proper regularization to identify important pathways and genes across multiple studies. Asymptotic theories are provided to show that our method is both pathway and gene selection consistent. More importantly, we explicitly show that pathway selection consistency needs milder statistical conditions than gene selection consistency, as it would allow false positives and negatives at the gene selection level. Finite-sample performance of our method is shown to be superior than other ad hoc methods in various simulation studies. We further apply our method to analyze five cardiovascular disease studies. Our method is intrinsically a general method on group-wise and element-wise selections from integrative analysis, which can have other applications beyond genomic research.

KW - Gene and pathway

KW - High dimensional analysis

KW - Integrative analysis

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=85011588296&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011588296&partnerID=8YFLogxK

U2 - 10.1016/j.jmva.2016.12.005

DO - 10.1016/j.jmva.2016.12.005

M3 - Article

VL - 156

SP - 1

EP - 17

JO - Journal of Multivariate Analysis

T2 - Journal of Multivariate Analysis

JF - Journal of Multivariate Analysis

SN - 0047-259X

ER -