Microbiome Data Analysis Exercise

Differential Microbial Populations in Patients with Irritable Bowel Syndrome (IBS)

IBS is a disorder that affects between 25 and 45 million people in the United States with direct costs that exceed $20 billion yearly and that accounts for 1.5% of the US health care budget. The disorder affects the large intestine, and signs and symptoms include abdominal pain, bloating, cramping, gas, and diarrhea (IBS-D) or constipation (IBS-C), or both (IBS-M). The Human Microbiome Project and the greater accessibility of tools for microbial metagenomics analysis have brought forward a possible role of the gut microbiome (GM) in IBS and IBS-D. Several studies have shown that IBS patients have different intestinal microbiota compared to healthy subjects.

The GM consists of over 1,000 unique microbial species and an estimated 1014 individual cells (~10x the total number of human cells in each individual). While recently the GM has received increased attention, there is a lack of consensus regarding the specific role(s) of the GM in IBS. Reports describing the GM in IBS patients have included: i) altered abundance of specific bacterial groups, ii) altered diversity of specific bacterial groups, and iii) altered ratios of specific bacterial groups.

Study Cohort

We have developed an expanding biorepository bank that contains demographic data, fecal, and blood samples from 57 individuals. We collected data such as demographics, IBS subtype, and we measured vitamin D levels in the blood serum of healthy controls and IBS patients, as they are outline in Table 1.

Table 1: IBS study cohort up to date

Use of Explicet in Microbiome Analysis

Explicet makes it easy to combine the metadata of a microbiome dataset with the taxonomic classifications of sequence data that are generated by 16S (SSU rRNA) pipelines (such as Qiime or Mothur) or by any taxonomy based tool such as RDP Classifier, ARB, or SINA.

With Explicet, it is easy to select specific samples from large datasets based on metadata criteria, such as barcode (a.k.a. Illumina index), disease status, or vitamin D levels and then create graphics (pie charts, bar charts, or heatmaps). Explicet also performs statistical analyses on the data selected, such as Two-Part (which provides p-values), Morisita-Horn/Bray-Curtis/ThetaYC (classic beta diversity sample metrics), or any of the many classic alpha diversity metrics such as Shannon or Chao1: rendering the results as Manhattan plots, heatmaps, or rarefacted collector's curves.

For tutorial please see:

We are using this software to perform gut microbiome analyses in our human cohort. Some examples of our analyses are show here:

Figure 1: Representative operational taxonomic unit (OTU) bar chart in patients that have IBS vs healthy individuals

We found that within our cohort:

•Class Clostridia is one of the most common taxa present across the samples

Bacilli and Bacteroida also prevalent across samples

Lachnospiraceae is a major difference in prevalence

Alpha, Beta and Gamma diversity

The α-diversity is just the diversity of each site (local species pool). The β-diversity represents the differences in species composition among sites. The γ-diversity is the diversity of the entire landscape (regional species pool).

Shannon index α-diversity Test

Shannon index measures how evenly the microbes are distributed in a sample. It answers the question "How different?" How are the microbes balanced to each other? Do we have species evenness (similar abundance level) or do some species dominate others?

Figure 2: Shannon index showing that data are representative of the specimens

Both the control and IBS group are asymptotic. Thus data is representative of the samples.

Morisita-Horn β-diversity Test

Beta diversity is a conceptual link between diversity at local and regional scales. Various additional methodologies of quantifying this and related phenomena have been applied. Among them, measures of pairwise (dis)similarity of sites are particularly popular. Undersampling, i.e. not recording all taxa present at a site, is a common situation in ecological data. Bias in many metrics related to beta diversity must be expected, but only few studies have explicitly investigated the properties of various measures under undersampling conditions.

β-diversity shows the different between microbial communities from different environments (IBS vs control). Main focus is on the difference in taxonomic abundance profiles from different samples. It answers the question How different is the microbial composition in IBS patients compared to healthy controls?

Figure 3: Morisita-Horn heat map shows with lighter color higher degree of similarity (red is 100% similar) while darker colors (black is 0% similar) indicate more dissimilarity with other values.

Two samples, namely, IBS-024 and IBS-012, showed a unique microbiome and all other samples had varying degrees of differences from one another.

Figure 4: Gut microbiome in IBS and disease. Panel A: Cohort biodiversity was higher in healthy control subjects than IBS patients as assessed by a two-tailed heteroscedastic t-test (p=0.046). Panel B: At the phylum level, Firmicutes was enriched in IBS by 17.8% while Bacteriodetes and Actinobacteria populations were decreased by 1.2% and 11.5%, respectively. In female IBS patients, Bacteroidetes were decreased by 85.3% and Firmicutes were enriched by 14.7% compared to healthy subjects.

Use of Calypso in Microbiome Analysis

Calypso has a focus on robust multivariate statistical approaches that can identify complex environment-microbiome associations, whereby differences in microbial composition can be attributed to multiple environmental variables. Powerful visualization techniques are provided, generating high quality figures that can readily be used in scientific publications. Calypso enables, at all taxonomic levels and for large taxonomic datasets, quantitative visualizations and comparisons of composition (e.g. bubbleplots, interactive hierarchical trees, Krona plots and heatmaps), parametric and non-parametric statistical tests, univariate and multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis, and diversity estimates. To render the data suitable for analysis by standard statistical procedures, Calypso can transform and normalize community profiles using various methods, including log transformation and total-sum normalization. Since Calypso does not require scripting or programming skills, it enables all microbiology researchers to routinely apply advanced statistical analysis in their work.

Calypso is an easy-to-use online software, allowing users to mine, interpret and compare taxonomic information from metagenomic or 16S rRNA gene sequencing datasets. The software is free for academic use.

For tutorial please see:


Sequencing depth for microbiome analysis

Figure 5: The figure shows the sequencing depth per sample. In most microbiome analysis 100,000 reads per sample is considered acceptable and valid for further analysis.

The sequencing depth has a major impact on the sensitivity of 16s rRNA microbiome analyses for taxonomic profiling of samples with 90% host DNA. When decreasing sequencing depth, the number of microbial species that are not detected increase, along with unclassified and misclassified clades.

Microbial sample composition of top 20 taxa

Figure 6: The dendrogram shows the taxonomic relationships of the microbial sample composition of the top 20 taxa in the cohort. The green boxes denote the IBS group samples.

Biodiversity of samples across the variables of the human cohort

i) Disease status (Healthy vs IBS) ii) Vitamin D levels iii) Sex (Females vs Males)

Figure 7: Analysis of Variance (ANOVA) statistical test with Shannon index α-diversity Test. Shannon Index is a measurement of biodiversity measuring both species richness and evenness as a single value; larger values denote greater biodiversity. In all ANOVA tests (i, ii, iii) the p>0.05, meaning that the groups are not significantly different with each other.