Detecting microsatellites within genomes: no exact solution?

Sébastien Leclercq; Eric Rivals; Philippe Jarne

Communication Dans Un Congrès Année : 2006

Detecting microsatellites within genomes: no exact solution?

(1, 2) , (3) , (1)

1
2
3

Sébastien Leclercq

Fonction : Auteur
PersonId : 735520
IdHAL : sebastien-leclercq
ORCID : 0000-0002-3601-2316
IdRef : 122697340

Centre d’Ecologie Fonctionnelle et Evolutive

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Philippe Jarne

Fonction : Auteur
PersonId : 179320
IdHAL : philippe-jarne
ORCID : 0000-0002-6291-1999
IdRef : 067410030

Centre d’Ecologie Fonctionnelle et Evolutive

Résumé

Microsatellites are short tandem repeats (period of 1 to 6 pb) that are present in the genomes of all living organisms. For some species, they account for a significant DNA proportion, with approximatively 3% of the Homo sapiens genome for example (International HGSC, 2001). Some of these elements have a remarkable hypermutability, with an average mutation rate of the order of 0.001 for the human, which is primarilly caused by insertion or deletion of one or more repeats. These length variations are the consequence of a specific molecular mechanism named DNA slippage, which is not well understood yet (Goldstein & Schlotterer, 1999). Microsatellites are extensively used as molecular markers since many years, but the question of their evolution started to be studied a dozen of years ago. One technique is to compare length theorical distributions (generated from mutation models) to real distributions. The latter are obtained from known microsatellite loci (Jarne et al 1998, Dettmann et Taylor 2004), or by extraction from genomic sequences either with personal algorithms (Kruglyak et al. 1998, Dieringer et Schlotterer 2003, Sainudiin et al. 2004) or with dedicated sofwares based on advanced algorithmic notions. A dozen of these algorithms were published since 1997, and can be grouped them into 4 major classes : 1- methods of alignment against a consensus sequence, 2- combinatorial algorithms of repeat identification, 3- heuristic approches based on statitical criterias, 4- methods based on the compression capacity of repeated sequences. We propose here to expose some of these algorithms, and to compare major differences. Four softwares were chosen, each representing one of the above classes : RepeatMasker (http://repeatmasker.org) for the sequence alignment, Mreps (Kolpakov et al 2003) for the combinatorial method, TRF (Benson 1999) for the statistical method and finally STAR (Delgrange & Rivals 2004) for the compression method. Each software have specific parameters, constraints and output formats, that impose to normalize datas before doing inter-algorithm comparisons. These comparisons are based on 4 microsatellite features : their length, their perfection degree (i.e. the percentage of mutation), the repeat length and the chromosomal position. First observations show that, on the scope of a single algorithm, parameter choice can have a significant influence on detected microsatellite distributions. For example, TRF detection number can vary by a factor 20 simply by changing the minimum score parameter. To take these variations into account in the inter-algorithm comparison, we chose the STAR distribution as a reference (STAR does not take parameter), and we calibrated the parameters for each other algorithm to obtain a distribution the closest to this reference. Results for inter-algorithm comparison on the human X chromosome show a significant detection divergence. TRF and Mreps detect much more tandem repeats than STAR and RepeatMasker, and particularily for small lengths. On the other hand, Star and TRF are more stringent for highly degraded microsatellites. This study highlights the fact that the way the microsatellites are detected can change biological models fitted on, and finally lead to mistaken interpretations.

Mots clés

microsatellite distribution tandem repeats VNTR algorithm comparison evolutionary model

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Génétique des populations [q-bio.PE] Génomique, Transcriptomique et Protéomique [q-bio.GN]

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00120181

Soumis le : mercredi 13 décembre 2006-15:08:45

Dernière modification le : mardi 5 novembre 2024-14:26:18

Dates et versions

lirmm-00120181 , version 1 (13-12-2006)

Identifiants

HAL Id : lirmm-00120181 , version 1

Citer

Sébastien Leclercq, Eric Rivals, Philippe Jarne. Detecting microsatellites within genomes: no exact solution?. Integrative Post-Genomics, Nov 2006, Lyon, pp.27. ⟨lirmm-00120181⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EPHE CNRS UNIV-MONTP3 INRA CEFE MAB LIRMM AGROPOLIS PSL MIPS B3ESTE UNIV-MONTPELLIER INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER

311 Consultations

0 Téléchargements

Detecting microsatellites within genomes: no exact solution?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager