/ WHAT IS THE PROBLEM WITH GENETIC DIAGNOSTICS BASED ON NGS METHODS
At present moment, diagnostic methods based on high throughput reading of genome(so-called - Next Generation Sequencing) received wide application in labs dealing with rare disorders. But NGS finds cause of disorder not for all patients and cases.
The purpose of the search for genetic tests are mutations(so-called genetic variants) that cause the manifestation of the disease, symptoms. (According to ACMG recommendations that implements in this field).
From 1 to 60% of patients find the causative genetic variants of their disease based on NGS methods. The percentage depends on the type of disease, symptoms. [link]
/ WHAT ROLE DOES THE BIOINFORMATIC DATA PROCESSING PLAY IN THE PROBLEMS OF GENETIC DIAGNOSTIC
The basis of NGS diagnostics is the processing of DNA sequence data to identify mutations that cause the disease.
Data about DNA sequence takes from a special sequencing machine.
DNA sequence data handled with special programs to detect mutations in DNA. Most of these programs are open-sourced and developed by different research groups all over the world. No one program can detect all existing mutations in patients samples alone. Pipeline - is the sequence of these programs and their parameters. Every program has its advantages and disadvantages. Percent of detected true genetic mutations depends on the pipeline.
Proper pickup of programs and their settings can lead to the improvement of the detection of mutations, which previously was not detected. For example for up to 15% of patients from a cohort of 156 previously conducted tests with no diagnosis.[link]
Genetic tests based on NGS detect not all 100% of true variants and miss a great number of them.
This means that from all truly existed variants in patient samples not all detected by genetic tests. Variants in DNA classified based on their length in nucleotides and the way they change DNA.
For single nucleotides variants usual detection percent is - 86.6-95.3%; 96-97%. [link],[link]
24379 short variants not detected (False Negatives) by 70 pipelines(sequence of programs and combination of parameters) that reside on the basis of genetic tests. Only 94-99% of short variants was detected at least one pipeline. This confirms that detection hugely depends on proper pipeline selection.[link]
F-score for indels(second most common type of variants) are 0.75-0.91. [link]
100% detected true indels is equal to 1 F-score value. This means that it missed a large number of indel variants and represent large detection diversity between tests/labs, depending on the pipeline. [link]
Other studies confirm that the detection ability of indels greater then 50 nucleotides significantly decreases with the length of this type of variant.
The detection ability for CNV(third most common type of variants) of different sizes is low, for example - F-score lower than 0.5. [link]
Indicators claimed by laboratories regarding their ability to detect variants created on limited data and might be overestimated or misinterpret by test users.
In order to provide an estimation of test ability to detect mutations, each laboratory should compare each test result with the true set of mutations from the number of samples collected during laboratory work. This is the essence of the validation process.
As 100% of true mutations are taken those that are covered by Sanger reads(short sequences of readed DNA) or confirmed by microarray method (only for known mutations from the databases).
Samples from Sanger sequencing are usually not completely covered by reads.
Without coverage of the entire genome/exome from Sanger sequencing, the NGS method has no information about the true set of mutations.
Also, validation may take place on samples from the 1000 Genome Project with a limited number of well-studied samples.
These 1KGP samples do not represent the full diversity of sequencing data that exists in the population.
This means that the true size of the percentage of mutations that test can detect is lower on samples out of the validation set. [link]
Quality numbers published by the lab greatly depends on how validation of the test was done.
Not only pipelines, but also specific patient sequencing data affect the detection errors of genomic variants.
Detection results (lists of variants) coincide between different pipelines by 82-97% depending on the patient data set.[link]
Detection quality in NGS genetic tests varies between clinical laboratories, increasing the likelihood of low quality at a high price.
Information about the quality of detection in NGS genetic tests published by laboratories is most often non-detailed, incomplete, limited in access, or sometimes completely absent.
/ What solutions does we offer
Tool for searching NGS tests in diagnostic laboratories by time, price and different quality parameters.
Clinical Labs from Europe
Categories of diseases in search filter
WES WGS Panels Sanger MLPA PCR Microarray Other
Search by NGS sequencing machine
On the basis of genetic test reside machine that read DNA.
The quality of the test depends on the model of the sequencing machine.
We created a rating for decision making based on the quality of sequencing instruments that widely used in NGS genetic tests. This rating based on data provided by the manufacturer and independent studies.
We sort these instruments in the order of decreasing of quality of tests performed on them. 1 is the best quality, 6 is low.
NGS methods consist of reading sample DNA with short segments(reads).
Cutting DNA for reads perform during sample preparation. In order to provide magnification of signal for sequencing instrument sample DNA copied and randomly cut for reading many times. Then this reads align on reference human DNA. Coverage is the number of times each nucleotide(base pair) vertically covered by reads.
The quality of tests depends on coverage.
Laboratories usually publish 3 metrics about their test coverage:
Average absolute coverage - most frequent value of coverage across all sequenced nucleotides. The bigger value, the better quality of test.
% of covered nucleotides - percent of nucleotides from the total length of sequenced nucleotides covered at least a certain threshold (>X times). The bigger value, the better quality of test.
Coverage threshold - this or greater coverage has a certain % of sequenced nucleotides. The bigger value, the better quality of test.
Search by type of disorders, body systems.
967 Categories of diseases in search tree filter in total.
41 root categories.
We provide an independent assessment of the quality of data processing (of pipeline) previously provided by a laboratory for a particular patient.
We estimate the quality of data obtained directly at the exit from the sequencing machine,i.e. initial quality.
We estimate and compare the initial quality with the quality of the whole data processing(of pipeline).
A comparison between initial and final quality provides decisions about what needs to be improved in data processing to detect all true genomic variants(mutations).
We provide estimation based on metrics that reflect quality from different sides and gives the full picture of what has been done.
-Q, nucleotide quality
-Nucleotide error rate, %
-Number of nucleotides in the variant allele
-Mapped reads, %, absolute number
Based on estimations mentioned above, improvements in the detection of variants that need to be done can be performed with our pipeline or with any other pipeline in the appropriate lab, that patients will choose.
From a technical point of view, the accuracy of the detection of variants depends on the quality of the stages of data processing.
We perform automatic selection of appropriate bioinformatics tools and their parameters for improvement of main pipeline stages:
- reads nucleotides error correction,
- base quality score recalibration,
- variant calling,
- filtration between True variants and false variants.
These selection performs from a wide space of possible combinations and choose a better combination of software and its parameters for particular patient data.
We perform detection of such genetic events as: snp, indel, cnv, applying special efforts to commonly bad detected events like indels for more than 20 bp of length, and also for CNV of less than 1 kb and all other sizes.
All variants that we detect can be confirmed with methods routinely used in labs.
Patients can choose lab by themselves or we can help with this selection. With the input list of detected variants, any lab can conduct clinical interpretation of new findings.