Huang Lin

Conference 2022 Live Talk


Talk title

Sparse estimation of correlations among microbiomes


Authors and Affiliations

Huang Lin1, Shyamal Das Peddada1

1. Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), Bethesda, MD, the US




It is well-known that human gut microbiota form a complex ecology where microbes interact with each other to maintain a healthy ecosystem. An ecosystem can be disrupted by external factors not only by changing the abundance of various microbes, or their diversity, but they can also potentially change interactions or correlations among microbes. There is considerable literature to evaluate the differential abundance of microbes or to evaluate their diversity in an ecosystem, but methods for quantifying interactions or correlations between a pair of microbes are not well developed. Furthermore, microbes do not necessarily interact or correlate linearly.


We introduce a new methodology called Sparse Estimation of Correlations among Microbiomes (SECOM) which is appropriate for linear or non-linear relationships between a pair of taxa. The method overcomes the compositionality issue by explicitly incorporating the estimation of library-specific sampling fraction and taxon-specific sequencing efficiency into the model. The sparsity and positive definiteness of the estimated correlation matrix are obtained by either a data-driven thresholding approach or by p-value filtering.


Simulations studies showed that SECOM successfully achieved almost perfect true positive rates and had uniformly smaller false positive rates than existing methods with regards to either the detection of linear or non-linear relationships. SECOM also maintained scale invariance when analyzing datasets from multiple ecosystems. Applying SECOM to global gut microbiome data, we discovered a potential commensal cluster of Bifidobacterium species among infants (age < 2 years old).


In contrast to standard Pearson/Spearman correlation coefficients, SECOM honors the compositionality in microbiome data and hence minimizes false positives/negatives. Compared to compositional tools such as SparCC and proportionality, SECOM can not only identify linear correlations but also detect non-linear correlations among microbiota, and is also scale-invariant when applying to data from heterogeneous ecosystems. Thus, SECOM fills an important gap in the literature.