Informatics advances reveal the TruPath™ Genome towards comprehensive genomic insights

Introduction

Standard short-read sequencing-by-synthesis technology (SBS) produces billions of sequencing reads up to 500 base pairs (bp) in length. Benefits of standard short-read sequencing include high accuracy and throughput at scale. Illumina’s short-read sequencing benefits from a mature and well-established ecosystem of data analysis tools and pipelines. Despite these advantages, short-read technology has limited ability to interrogate a small proportion of challenging, complex regions of the genome, many of which harbor variants that have potential roles in human genetic disease.^{[1, 2]}

Some of the most medically important regions of the genome have remained out of reach—until now. Illumina TruPath Genome (for Research Use Only), launched in 2026, delivers an efficient solution for comprehensive human genome sequencing. The simple, ~10 minutes of hands-on time combines long-distance genomic insights to be generated using standard short-read sequencing with the NovaSeq X™ series. Here, we describe the various ways proximity information is leveraged within the DRAGEN™ software suite and demonstrate the effects of improvements in variant detection. Together, TruPath Genome and the DRAGEN Germline application fundamentally extend the reach of SBS, enabling long-distance phasing and variant/haplotype discovery in genomic regions that were previously inaccessible.

Highlights

Simplest sample-to-sequencer workflow with ~10 minutes of hands-on time

Enabled using existing NovaSeq X systems (v1.4 Digital package required)

Highly accurate and phased germline Single Nucleotide Variant (SNV) calling

Improved coverage of difficult-to-map regions of the genome

Ultra-long phasing

Reliable de novo haplotype-resolved variant calling in paralogous regions

Enhanced detection of structural variants

Improved resolution of Short Tandem Repeats (STRs)

Proximity mapped read technology

The TruPath Genome on flowcell library prep technology unlocks longdistance genomic insights with unprecedented simplicity, enabling >200 kb template reconstruction using standard shortread sequencing.

TruPath Genome with proximity-mapped read technology provides unprecedented workflow simplicity by eliminating standard library prep prior to sequencing (Figure 1A)^[3]. The TruPath Genome workflow is compatible with both high molecular weight (HMW) and standard molecular weight (SMW) extraction methods. The spatial proximity of neighboring nanowells allows for reconstruction of long-distance genomic connections extending to over 200 kb (Figure 1B), which can be used in a variety of ways.

Overview of TruPath Genome data analysis workflow in DRAGEN Germline

DRAGEN Germline integrates proximity information at every step—from mapping to SV calling—to deliver a more complete, phased, and structurally aware view of the genome.

The proximity-based long-range information available in TruPath Genome datasets can be leveraged in many different components of the analysis workflow to yield a more accurate and complete view of an individual’s genome. Many such improvements have now been implemented as part of DRAGEN beginning in version 4.5 (Figure 2), making them accessible to the user with a single command either locally, on the cloud, or through a fully automated sequencing to results workflow. A complete analysis (all callers active) of a TruPath Genome (typically 60-70X) can be performed in under three hours per sample on a local DRAGEN v4 server. These previously ‘dark’ regions often hide variants linked to rare disease—so illuminating them isn’t just a technical win, it’s also a win for clinical researchers.

Mapping and read phasing

TruPath dramatically improves read placement in regions that were previously difficult to map, unlocking >20 Mb of new genomic territory.

DRAGEN Germline read mapping in the context of TruPath Genome datasets leverages proximity information from neighboring clusters to confidently assign a higher proportion of reads to the correct genomic location in regions of high sequence homology. This approach significantly reduces the fraction of the genome with low mapping quality, making >20 megabases of previously challenging genomic regions accessible to variant detection (Figure 3A). Figure 3B shows an example of a clinically relevant gene, RHCE, where read mapping is significantly improved relative to standard short-read WGS.

Further, to efficiently make use of phasing information in all downstream variant calling components, DRAGEN Germline leverages a novel phasing approach that phases reads to inferred ancestral haplotypes during the mapping stage. This approach selects the most closely related haplotype segment pairs from a haplotype database, while considering recombination rates and long-range proximity information, to provide probabilistic assignments of reads to haplotypes and to define phase blocks within which haplotypes remain consistent with high confidence. Such phased reads are output to a haplotagged BAM file and are leveraged within downstream variant calling steps to generate accurate and long-range phasing information between variants (Figure 3C).

Phased small variant calling

By combining improved mapping with haplotype tagged reads, TruPath delivers the most accurate and complete small variant calls to date—now fully phased across long genomic distances.

TruPath Genome small variant calling benefits from both improved mapping as well as read phasing information. Higher mapping quality and accurate read placement in difficult-to-map regions of the genome enable confident calls in a broader set of genomic regions. Phased reads and their associated phasing quality are directly incorporated into the variant calling model and used for phased variant calling (i.e. treating 0|1 and 1|0 as distinct genotype hypotheses), outputting calls that are phased with respect to each other if they are within the same phase block. Such innovations lead to the most accurate and complete small variant call set to date (Figure 4A) along with long-range phasing between variants (Figure 4B).

Structural variant calling improvements

Haplotype specific assemblies and colocation maps give TruPath a powerful new lens for detecting large and complex structural variants with higher confidence.

Structural variant (SV) detection in TruPath Genome data also benefits both from improved read mapping and phased reads. Because TruPath phases reads upfront, DRAGEN can assemble each haplotype separately—leading to cleaner assemblies and more accurate SV calls. This cleaner assembly process is responsible for most of the performance improvement in SV detection with TruPath Genome (Figure 5).

Colocation maps for structural variant interpretation

Colocation maps reveal genome structure in two dimensions, exposing inversions, translocations, and other large SVs through intuitive off-diagonal signals.

In addition to improved performance on the NIST structural variant truth set of insertions and deletions in HG002, TruPath Genome also generates a new kind of output referred to as colocation maps. Think of a colocation map like a heatmap of the genome’s 3D structure—where unexpected off-diagonal signals reveal structural variants hiding in plain sight. Regions that have the same structure as the reference genome primarily display proximity signal along the colocation map diagonal, while regions where the individual’s genome structure differs from the reference display strong off-diagonal signals and signal depletion along the diagonal. Different types of structural variants show different patterns in the colocation maps (Figure 6A) and large clinically relevant structural variants can be clearly observed in such representation (Figure 6B). Given the independent nature of the colocation map signal compared to signals currently used for structural variant detection in standard short-read sequencing (e.g. split-reads and improperly paired reads), this signal can be used to filter large SVs detected by DRAGEN Germline SV leveraging standard short-read sequencing signals. DRAGEN Germline applies this filter to break-end calls that are either inter-chromosomal or intra-chromosomal but larger than 200 kb. This drastically reduces the total number of such calls in a sample genome, most of which are putative false positives, allowing for much smaller call set for these types of large events. Further integration of colocation map signals into SV detection will be incorporated in future DRAGEN Germline versions to improve sensitivity and specificity.

Small variant calling in paralogous regions

TruPath enables copy-number–aware, haplotype resolved variant calling in highly homologous gene families—finally resolving paralogs long considered inaccessible to short reads.

DRAGEN Germline Multi‑Region Joint Detection (MRJD), paired with TruPath Genome, enables haplotype‑resolved, copy‑number–aware de novo germline small variant calling in segmental duplications. These regions are challenging for standard short‑read sequencing analysis, because high homology and structural complexity can cause ambiguous or incorrect read mapping, which leads to unreliable variant detection. TruPath Genome enables MRJD to retain reads from the paralogous loci regardless of mapping quality, estimate total copy number using read‑depth evidence, and then reconstruct the underlying copies for each paralog set by integrating copy number, read sequences, and long‑range proximity linkage information. MRJD then calls small variants on the reconstructed copies and reports phased variant calls along with their assigned genomic locations or haplotypes. This variant calling process does not rely on known population haplotypes. For clinical researchers investigating Lynch Syndrome, distinguishing PMS2 from PMS2CL has long been a diagnostic challenge. TruPath finally resolves these regions with haplotype level clarity. Figure 7 shows this for the highly homologous PMS2-PMS2CL pair (~21 kb each; ~99% identity), where standard short reads generate ambiguous, unphased variant calls across both loci, while TruPath Genome data with MRJD enables phased haplotypes that are concordant with on-market long read results.

MRJD with TruPath Genome currently supports 15 clinically relevant genes; Table 1 summarizes the supported loci and concordance versus orthogonal long‑read data.

Table 1. Median SNV concordance of phased germline small variant calls against orthogonal long‑read data for medically relevant paralogous genes supported by MRJD with TruPath Genome at launch. The concordance was measured on 14 diverse cell line samples with both HMW and standard DNA extraction. For CFHR1-CFHR2-CFHR3-CFHR4 and USP18, no orthogonal comparator call set was available, thus, concordance is reported as N/A.

Paralogous gene	Disease relevance	HMW DNA median concordance	Standard DNA median concordance
*PMS2*	Lynch Syndrome	0.991	0.951
*SMN1-SMN2*	Spinal Muscular Atrophy	0.941	0.929
*NCF1*	Chronic Granulomatous Disease	0.992	0.991
*CYP21A2*	Congenital Adrenal Hyperplasia	1.000	1.000
*TNXB*	Ehlers-Danlos Syndrome	1.000	1.000
*STRC*	Recessive Nonsyndromic Hearing Loss	0.983	0.980
*CYP2D6*	Pharmacogenetics	0.973	0.976
*CYP11B1-CYP11B2*	Glucocorticoid-remediable Aldosteronism	0.997	0.997
*CFHR1-CFHR2-CFHR3-CFHR4*	Atypical Hemolytic Uremic Syndrome	N/A	N/A
*SP18*	Type I Interferonopathy	N/A	N/A

Improved short tandem repeat size estimation accuracy

Proximity signals allow TruPath to recover and assign in-repeat reads, enabling accurate STR sizing across the full expansion range—without the plateau seen in standard WGS.

Short tandem repeat expansions (STR) are associated with a wide range of neurological and neuro-developmental disorders. While the presence of an STR expansion of size beyond the small ranges observed in healthy population may be a strong indication of pathogenicity, the expansion size is known to modulate the presentation of many associated conditions. While traditional whole-genome sequencing (WGS) is an effective method to distinguish non-expanded from expanded STRs, its ability to accurately estimate the size of large STR expansions is limited due to inability to unambiguously assign fully repetitive read pairs to a specific STR, thus hampering a more nuanced classification of STR expansion status.

Proximity information available in TruPath Genome can help to resolve ambiguous mappings of reads fully contained within expanded STRs by assessing their proximity to the unique flanks of specific STR loci. The more complete recovery and assignment of in-repeat reads also allows for locus-specific adjustments to in-repeat read counts, which addresses biases associated with decreased sequencing efficiency of certain STR motifs. Furthermore, phasing information available in TruPath Genome allows for haplotype-specific STR size estimation even in cases where STR expansions occur in both parental haplotypes. This combined set of improvements leads to better STR size estimation and more nuanced and accurate STR expansion status classification (Figure 8).

Conclusion and next steps

The launch of TruPath Genome marks a critical turning point for the traditional trade-offs between workflow complexity, accuracy, and comprehensive genomic insights. By integrating proximity-based long-distance insights into core components of the DRAGEN germline data analysis workflow, Illumina has enabled standard short-reads to resolve regions of the genome and types of variants long thought inaccessible or incompatible.

For labs and researchers, the implications are profound^[4]:

Identification of variants associated with genetic and rare disease: By resolving paralogous genes, accurately sizing STR expansions, improving SV detection capabilities, and delivering phased variants, TruPath Genome offers a clear path to solving rare disease cases that were previously inaccessible.

Operational Efficiency: The ability to achieve best-in-class small variant accuracy and phase up to 98% of genes in under three hours of analysis time means labs can consolidate multiple assays into a single, streamlined workflow.

Accessible Long-distance insights: This technology brings high-resolution, haplotype-resolved WGS to existing NovaSeq X systems, making comprehensive human genomics accessible and scalable with up to 16 genomes per run.

TruPath doesn’t just extend the reach of short reads—it redefines what they’re capable of. TruPath Genome is now available commercially; however, this initial product is just the beginning of what can be achieved with this new data modality. Future DRAGEN releases will continue to build on these analysis capabilities. Sign up below to stay updated on the future developments of DRAGEN Germline analysis of proximity mapped read technology, and future TruPath Genome solutions.

Learn more about TruPath Genome

Reference

1. Ebbert MTW, Jensen TD, Jansen-West K, et al. Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight. Genome Biol. May 20 2019;20(1):97. doi:10.1186/s13059-019-1707-2

2. Ryan NM, Corvin A. Investigating the dark-side of the genome: a barrier to human disease variant discovery? Biol Res. Jul 20 2023;56(1):42. doi:10.1186/s40659-023-00455-0

3. Illumina. Introducing constellation mapped read technology. Genomics Research Hub blog. 2024. https://www.illumina.com/science/genomics-research/articles/constellation-mapped-read-technology.html

4. Cheng S, Zhang Q, Zheng X, et al. Constellation illuminates rare disease genetics. medRxiv. Nov 10 2025:2025.10.15.25337675. doi:10.1101/2025.10.15.25337675

NovaSeq X innovation roadmap

Illumina 5-base solution

NGS Workflow Finder - now with oncology workflows

Illumina Connected Multiomics

NGS-based proteomics services

NovaSeq 6000 Reagent Kits v1.5

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Bringing Greater Insights, Answers, and Breakthroughs to Light

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

Celebrate DNA Day with Your Students

MiSeq i100 Series

MiSeq i100 Series

MiSeq i100 Series

MiSeq i100 Series

MiSeq i100 Series

MiSeq i100 Series

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina StrataMap Spatial

Illumina workflow solutions

Informatics advances reveal the TruPath™ Genome towards comprehensive genomic insights

Introduction

Highlights

Proximity mapped read technology

Overview of TruPath Genome data analysis workflow in DRAGEN Germline

Mapping and read phasing

Phased small variant calling

Structural variant calling improvements

Colocation maps for structural variant interpretation

Small variant calling in paralogous regions

Improved short tandem repeat size estimation accuracy

Conclusion and next steps

Reference