Energy landscape for large average submatrix detection problems in Gaussian random matrices

Research output: Research - peer-reviewArticle

Abstract

The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an (Formula presented.) Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the (Formula presented.) submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a (Formula presented.) submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number (Formula presented.) of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of (Formula presented.). For fixed k, the mean of (Formula presented.) is (Formula presented.) while the standard deviation is (Formula presented.). Our principal result is a Gaussian central limit theorem for (Formula presented.) that is based on a new variant of Stein’s method.

LanguageEnglish (US)
Pages1-65
Number of pages65
JournalProbability Theory and Related Fields
DOIs
StateAccepted/In press - Mar 9 2017

Fingerprint

Energy Landscape
Random Matrices
Energy
Joint distribution
Joint Distribution
Standard deviation
Asymptotic analysis
Social sciences
Central limit theorem
Fluctuations
Integer
Exploratory Analysis
Value Distribution
Social Sciences
Iterative Procedure
Asymptotic Analysis
Genomics
Consecutive
Limiting

Keywords

  • Central limit theorem
  • Energy landscape
  • Extreme value theory
  • Stein’s method

ASJC Scopus subject areas

  • Analysis
  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{22bf221fa2d04b71a7bd90a745340a72,
title = "Energy landscape for large average submatrix detection problems in Gaussian random matrices",
abstract = "The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an (Formula presented.) Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the (Formula presented.) submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a (Formula presented.) submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number (Formula presented.) of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of (Formula presented.). For fixed k, the mean of (Formula presented.) is (Formula presented.) while the standard deviation is (Formula presented.). Our principal result is a Gaussian central limit theorem for (Formula presented.) that is based on a new variant of Stein’s method.",
keywords = "Central limit theorem, Energy landscape, Extreme value theory, Stein’s method",
author = "Shankar Bhamidi and Dey, {Partha S.} and Nobel, {Andrew B.}",
year = "2017",
month = "3",
doi = "10.1007/s00440-017-0766-0",
pages = "1--65",
journal = "Probability Theory and Related Fields",
issn = "0178-8051",
publisher = "Springer New York",

}

TY - JOUR

T1 - Energy landscape for large average submatrix detection problems in Gaussian random matrices

AU - Bhamidi,Shankar

AU - Dey,Partha S.

AU - Nobel,Andrew B.

PY - 2017/3/9

Y1 - 2017/3/9

N2 - The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an (Formula presented.) Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the (Formula presented.) submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a (Formula presented.) submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number (Formula presented.) of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of (Formula presented.). For fixed k, the mean of (Formula presented.) is (Formula presented.) while the standard deviation is (Formula presented.). Our principal result is a Gaussian central limit theorem for (Formula presented.) that is based on a new variant of Stein’s method.

AB - The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an (Formula presented.) Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the (Formula presented.) submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a (Formula presented.) submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number (Formula presented.) of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of (Formula presented.). For fixed k, the mean of (Formula presented.) is (Formula presented.) while the standard deviation is (Formula presented.). Our principal result is a Gaussian central limit theorem for (Formula presented.) that is based on a new variant of Stein’s method.

KW - Central limit theorem

KW - Energy landscape

KW - Extreme value theory

KW - Stein’s method

UR - http://www.scopus.com/inward/record.url?scp=85014692118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014692118&partnerID=8YFLogxK

U2 - 10.1007/s00440-017-0766-0

DO - 10.1007/s00440-017-0766-0

M3 - Article

SP - 1

EP - 65

JO - Probability Theory and Related Fields

T2 - Probability Theory and Related Fields

JF - Probability Theory and Related Fields

SN - 0178-8051

ER -