Analysis of genomic data from high throughput sequencing: concepts and basic methods
Abstract
When a reference genome is available, analysis of Next Generation Sequencing (NGS) reads require to determine the plausible genomic position(s) for all reads. This computational step is termed mapping. This lecture will provide an overview of underlying concepts of the mapping question, algorithms, and pitfalls: alignment, approximate motif searching, filtration, and indexing data structures (like the Burrows Wheeler Transform). The impact of the read length, the background mapping probability, or sequencing errors on the results will be illustrated. This will lead us to present the most current mapping algorithms and their limitations; the results of a comparison will be presented. The cases of genomic and transcriptomic data will be considered in this regard, and finally the question of efficiency will be addressed. In a second time, we will give a short overview of concepts for assembly methods.