University of Cambridge > Talks.cam > Seminars on Quantitative Biology @ CRUK Cambridge Institute  > Discovery and allele frequency estimation of somatic twilight zone insertions and deletions

Discovery and allele frequency estimation of somatic twilight zone insertions and deletions

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Florian Markowetz.

Locating somatic mutations on the basis of next-generation sequence (NGS) data of disease-control matched samples constitutes an essential step in cancer and other clinical research. The detection of these genetic variants remains a major challange, not only due to the impurity and heterogeneity of the disease sample, but also because of the inherent uncertainty present in this type of data. The main sources of `noise’ in NGS data are commonly thought to be alignment and typing uncertainties, where the first refers to the fact that the origin of the reads on the genome are unknown, and the latter reflects the often limited confidence one has on whether a read stems from an allele-affected chromosome or not. The case of calling somatic twilight zone (or mid-size) indels is considered exceptionally hard, due to the high typing uncertainties involved. We present a maximum likelihood approach that allows us to robustly estimate twi- light zone indel allele frequencies while taking the individual read alignment and typing uncertainties into account. By means of likelihood factorization, we can estimate the allele frequencies in the disease and in the control sample simultaneously (while accounting for impurity as well). In addition, we define a likelihood-ratio based signficance test which allows one to test for the presence/absence of a somatic mutation. This statistical framework is not only restricted to somatic twilight zone indel calling, but allows for other applications as well, such as robustly genotpying di- and polyploid cells or de novo twilight zone indel calling, all while accounting for alignment and typing uncertainties.

In summary, our results point out that we have the first tool that can discover ‘somatic twilight zone insertions and deletions’ (indels of size 30-120 bp) at sufficient recall and precision— so far hardly any such somatic indels have been discovered. As a consequence, their extent and their potential effects have hardly been explored.

This talk is part of the Seminars on Quantitative Biology @ CRUK Cambridge Institute series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity