Illumina Connected Multiomics provides a powerful data science platform to streamline 5-base methylation and genomic multiomic analyses. The platform enables teams to design, experiment, collaborate seamlessly, and interact with traditionally complex workflows in real-time. Connected Multiomics transforms raw data into actionable biological insights. It digests DRAGEN outputs into a unified, multisample data structure that facilitates cohort-level analyses. This architecture simplifies common tasks such as data quality filtering, unsupervised clustering, and differential methylation analysis. Furthermore, it enables multiomic integration of informative methylation features and genomic variants. Here, we demonstrate a representative analysis workflow to showcase the capabilities of Connected Multiomics with an acute myeloid leukemia (AML) sample cohort.
Data quality control
The platform first ingests the outputs of DRAGEN and summarizes the data set at the multisample cohort level. Figure 1 illustrates an automatically generated dashboard that visualizes the distribution of common whole-genome sequencing quality control metrics across the cohort. Percent methylation per sample is defined as the average methylation level across all CpG positions in the sample genome. Percent unmethylated control and percent methylated control represent the average methylation across all CpG positions in spiked-in control genomes and are used to assess methylation conversion efficiency. Higher methylation levels in the methylated control and lower methylation levels in the unmethylated control indicate improved conversion quality.
Figure 2 shows how you can visualize a histogram over a QC metric of interest and set custom filters. These filters can exclude samples to potentially improve the quality of downstream data analysis.
Supervised and unsupervised clustering
After a sample cohort is defined, you can perform exploratory analyses, such as clustering, to visualize global structure and heterogeneity within the data set. Connected Multiomics supports clustering at both single-CpG resolution and over aggregated genomic features, such as promoter regions, where CpG methylation is averaged across each feature. In addition, you can define custom feature sets tailored to the context of the study to enhance clustering performance further.
Figure 3 illustrates how you can evaluate principal component analysis (PCA) clustering performance using generic promoter regions or a custom region set of AML specific epigenomic features. Notably, certain AML subtypes, including KMT2Ar and IDH-mutant cases, show improved separation when clustering is performed using AML-specific features. To enhance clustering performance further, non-linear dimensionality reduction methods, such as UMAP and t-SNE, are also supported. However, these methods often require parameter optimization.
For uniform manifold approximation and projection (UMAP), parameters such as the number of principal components and the number of nearest neighbors must be carefully tuned. Figure 4 illustrates how you can set up multiple UMAP optimizations and visualize the results together. From this UMAP parameter screen, UMAP parameter set 3 achieves strong separation of all AML subtypes.
To validate the clustering results, Figure 5 shows the application of k-means clustering over a range of cluster numbers, identifying five as the optimal number for this dataset. You can annotate the UMAP with the k-means cluster labels with the number of clusters parameter set at five. This quantitative agreement confirms the biological relevance of the visually observed clusters.
Differentially methylated region calling
Connected Multiomics streamlines the identification of differentially methylated regions (DMRs) by integrating a widely used DMR caller that uses dispersion shrinkage for sequencing data (DSS) directly into its interactive sandbox environment. Sample groupings can be defined from metadata or cluster labels from the PCA/UMAP tasks. DSS models the CpG position methylation as a beta binomial distribution, and statistically significant differentially methylated positions between sample groups are stitched together to create DMRs. Figure 6 shows how DMRs can be easily visualized and filtered for downstream analyses. Consistent with literature, AML patients carrying IDH mutations typically have broadly hypermethylated phenotypes, which result in the larger number of hypermethylated DMRs compared to hypomethylated ones. The diff.Methy metric represents the average methylation difference between the two sample groups over a specific genomic region, and the length is the base-pair length of the DMRs. The areaStat metric is integrated with the statistical significance of all the CpG positions in a DMR, which is most strongly correlated with DMR length. Larger DMRs that have larger methylation differences will result in a larger areaStat absolute value. Significance labels are provided as a guide to help you interpret DMRs at a glance. However, biological context and study-specific priors must guide the interpretation of DMRs.
Pathway analysis
Following DMR calling, Connected Multiomics facilitates the translation of DMRs into more functional inferences. Figure 7 shows how DMRs of interest can be filtered for high methylation differences (for example, greater than 0.2 methylation difference) and annotated with gene names within 5 kb of transcription start/stop sites. You can customize the maximum genomic distance to tune the interpretation of DMR-gene associations relevant to the biological context of their study.
Because DNA methylation typically regulates gene expression at promoters, most DMRs associated with genes are localized to transcription start site (TSS) regions. Depending on the applied filtering criteria, identified genes can exhibit either hypo- or hypermethylation relative to the IDH-mutant patient group. These gene-level findings can be contextualized further at the pathway level using the Connected Multiomics integrated gene set enrichment analysis. This functionality enables a broader interpretation of the underlying biological processes.
Multiomic analysis
Variant analysis modules
Connected Multiomics provides a unified environment for integrating methylation and genomic variant analyses, unlocking the multiomic potential of the Illumina 5-base assay. The representative workflow described in this section overlays DMRs with genes containing small genomic variants, including single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels). Figure 8 shows how variants can be filtered using standard variant call format (VCF) fields such as depth (DP). In addition, Connected Multiomics uses Illumina-specific and popular public databases to refine the variants of interest further. For example, the gnomAD, DRAGEN Haplotype Database, and primate AI can be used to remove germline variants from somatic variant calling results. Promoter AI can be used to predict the gene activity. Figure 9 shows how variants can also be viewed at the cohort level to observe shared variants among the cohort.
Methylation and variant integration modules
Connected Multiomics integrates methylation and variant information at the gene level whereby both DMRs and variants must first be annotated with genes as shown in Figures 7 and 10, respectively. This gene-centric integration prioritizes functionally relevant regions of the genome, with planned extensions to additional regulatory loci in future releases. Figure 11 shows the output table after DMRs and variants are intersected. This output has been embellished with a regional methylation view and additional graphics generated outside of Connected Multiomics to provide context. In this example loci, there is a cluster of variants at the HOXA9 gene in KMT2Ar-mutated patients, which correlates with hypomethylation of the HOXA9 gene. This correlation could imply that these HOXA9 variants have functional consequences as hypomethylated genes that are associated with gene expression. Thus, DMRs could give functional inferences to interpret variants of unknown significance.
Workflow visualization
Through the AML case study presented, we demonstrate an end-to-end analysis starting with data quality control in Figure 12. Connected Multiomics provides methylation and variant analysis tools to use the multiomic nature of the Illumina 5-base data type. You can perform rigorous clustering validation, DMR calling based on metadata and cluster labels, and contextualization of DMRs with gene and pathway information. In parallel, you can annotate and filter genomic variants and visualize variants at the cohort level. Variants can be annotated further with DMRs to provide a more complete interpretation of regulatory and genetic drivers underlying disease. Figure 12 also highlights the transparency of a collaborative analysis because teams can track progress in real-time and branch analyses. In summary, these capabilities demonstrate how Connected Multiomics brings multiomic data, analysis, and interpretation into a single transparent and collaborative environment, accelerating biological insights from Illumina 5-base data sets.
For more information, read the Illumina Connected Multiomics 5-base flyer or visit the Illumina Connected Multiomics webpage.
Related links:
Blog- An introduction to the Illumina 5-Base Solution
Blog- From sample to insight: Streamlining analysis with the Illumina 5-base solution