top of page

PEER REVIEWED

GenomeNext's bioinformatics pipeline is the only pipeline used for clinical application and discovery of novel variants that has been peer-reviewed and published. Genome Biology has validated our bioinformatics process, accuracy, parallelization methods and claims of reproducibility and determinism for clinical application.

Validation of Pipeline

Abstract

While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources.

Nationwide Children's Hospital in Collaboration with GenomeNext Wins

The CLARITY

Undiagnosed Challenge

CLARITY Undiagnosed Challenge validates the utility of the GenomeNext platform for clinical genomic diagnosis. 

The CLARITY Undiagnosed Challenge enrolled five families—three with affected children and two adults. New potential disease genes and novel genetic variants were identified for several of the families. In addition, several families received confirmations of previous genetic findings and others ruled out through results generated by GenomeNext and Nationwide Children’s Hospital.

CLARITY:
Children’s Leadership Award for the Reliable Interpretation and Appropriate Transmission of Your Genomic Information

“Given the extremely large size of whole human genome sequence data, you run the risk of finding something that might not be a real, relevant factor. The patient cases for the CLARITY Challenge were particularly demanding because these patients already had been through extensive clinical and genetic tests. Therefore, one had to look for changes in unusual genes or rare genetic variants (including structural and non-coding variants) that are unlikely to be a part of routine clinical genetic testing. This information, with its uncertainties, had to be distilled into a report that would be understandable, help guide the clinicians, and provide information to the families."

Dr. Peter White, Director of Nationwide Children’s Biomedical Genomics Core

Nationwide Children’s Hospital, utilizing GenomeNext’s genomic analysis pipeline, outperformed 26 leading organizations from around the world that ultimately completed the competition, delivering the most accurate and clinically actionable genetic data. 

Compeing Organizations: 

 

1. Bina Technologies (Redwood City, CA) 

2. Centre for Genomic Regulation (CRG) (Barcelona, Spain) 

3. Clinical Institute of Medical Genetics (Ljubljana, Slovenia) 

4. Codified Genomics, LLC (Houston, TX) 

5. Emory University School of Medicine (Atlanta, GA) 

6. Enlis Genomics (Berkeley, CA) 

7. Geisinger Health System (Danville, PA)

8. Gene.us (Austin, TX) 

9. Genomatix Software GmbH (Munich, Germany) 

10. Institute for Systems Biology (ISB) (Seattle, WA)

11. Inova Translational Medicine Institute (ITMI) (Falls Church, VA) 

12. Intelliseq (Krakow, Poland) 

13. Invitae Corporation (San Francisco, CA) 

14. Mendelics (Sao Paulo, Brazil) 

15. Miti Medicine Inc. (Cambridge, MA) 

16. QIAGEN (Redwood, CA) 

17. Rare Genomics Institute (Hanover, MD) 

18. Seven Bridges Genomics (Cambridge, Massachusetts) 

19. SNPedia (Potomac, MD) 

20. SolveBio (New York, NY) 

21. Stanford University (Stanford, CA) 

22. University of Southern California (Los Angeles, CA) 

23. University of Utah (Salt Lake City, UT) 

24. Tel Aviv University (Tel Aviv, Israel), and Variantyx Ltd (Ashland, MA) 

25. Tute Genomics (Provo, Utah) 

26. WuXi NextCODE Genomics (Cambridge, MA)

Why GenomeNext & NCH Won

We performed analysis using the FASTQ data of each case, not the VCF provided by Clarity, thus, we were able to find variants missed by other groups by using our proprietary secondary analysis pipeline to analyze the data from FASTQ to VCF.  In doing so, NCH was able to provide a more thorough clinical report for each case, identifying pathogenic variants that all other competitors missed.

Clarity Challenge

Validating the GenomeNext Pipeline with

1,000 Genomes Project

The 1000 Genomes Project was the first project to sequence the genomes of a large number of people and to provide a comprehensive resource on human genetic variation. The goal of the 1000 Genomes Project was to find most genetic variants that have frequencies of at least 1% in the populations studied. In order to demonstrate the genomic analysis pipeline’s utility for population scale genomic analysis, 1,088 low coverage whole-genome samples from “phase 1” of the 1000 Genomes Project (1KG) were processed from FASTQ to a single multi-sample VCF in 7 days using 400 Amazon EC2 instances (cc2.8xlarge spot instances). The total analysis cost was ~$12,000, inclusive of data storage and processing.

 

The GenomeNext analysis pipeline identified 41.2M genetic variants versus 1KG’s 39.7M. The two call sets had 34.4M variant sites in common, of which 34.3M had the same minor allele with highly similar frequencies. The results were validated against previously identified variants (dbSNP Build138, excluding those from the 1KG submission). SNP validation rates were similar, 52.8% (GenomeNext) and 52.4% (1KG). However, due to improvements in indel calling since the original 1KG analysis, the analysis pipeline called three-fold more indels with a higher rate of validation (19.5% vs. 12.5%). Of the indels unique to our analysis pipeline, a 7 fold higher rate of validation was observed compared to those unique to 1KG. Of the GIAB consortium’s validated indel dataset13, 81.5% were observed in the “Churchill” analysis in contrast to 43.9% with the 1KG analysis. Our analysis pipeline called ~71% of the 99,895 novel validated indels in the GIAB NA12878 dataset (those not found in the 1KG analysis) with alternative allele frequencies as high as 100% (mean 40.2%).

 

In summary:

 

  • The 1,000 sequenced genomes publicly available had never been analyzed through a single analytical pipeline.

  • 1,000 sequenced genomes were analyzed by organizations that used different techniques, hardware, and analysis solutions and provided the analysis to the community which is currently used as the base sample (Gold Standard) to compare genomes analysis against in order to make medical determinations and scientific discovery.

  • Our estimate is that it cost over ~$100M to perform the analysis on the 1,000 sequenced genomes.  Moreover, it took years to conduct the analysis.

  • However, when NCH used Churchill to perform the analysis on AWS they discovered that the current analysis on the 1,000 genomes was extremely inaccurate and they discovered 30,000 new “variants”.

  • Our analysis pipeline performed the analysis in 7 days.  The result was fastest time to analysis and least expensive to analyze 1,000 genomes to date.

  • First time the 1,000 genomes were analyzed through a single analytical pipeline that is accurate, determinate, and 100% reproducible.

 

What it means:

 

All experiments and medical research that has been based on the “Gold Standard” for genomic experiments and medical determination that was conducted before are now questionable in light of the accuracy produced by the GenomeNext analysis pipeline.  Additionally, the analysis on the 1,000 sequenced genomes performed by our analysis pipeline could become the sample baseline for the world.

bottom of page