C. A. Learn, Resistance to Tyrosine Kinase Inhibition by Mutant Epidermal Growth Factor Receptor Variant III Contributes to the Neoplastic Phenotype of Glioblastoma Multiforme, Clinical Cancer Research, vol.10, issue.9, pp.3216-3224, 2004.

Z. Zhang, J. Wu, Q. Luo, Q. Liu, Q. Wu et al., Pygo2 activates MDR1 expression and mediates chemoresistance in breast cancer via the Wnt/?-catenin pathway, Oncogene, vol.35, issue.36, pp.4787-4797, 2016.

N. Martín-martín, Stratification and therapeutic potential of PML in metastatic breast cancer, Nat Commun, vol.7, p.12595, 2016.

R. L. Grossman, A. P. Heath, V. Ferretti, H. E. Varmus, D. R. Lowy et al., Toward a Shared Vision for Cancer Genomic Data, New England Journal of Medicine, vol.375, issue.12, pp.1109-1112, 2016.

J. Audoux, N. Philippe, R. Chikhi, M. Salson, M. Gallopin et al., DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition, Genome Biology, vol.18, issue.1, p.243, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01728770

J. M. Kirk, S. O. Kim, K. Inoue, M. J. Smola, D. M. Lee et al., Functional classification of long non-coding RNAs by k-mer content, Nature Genetics, vol.50, issue.10, pp.1474-1482, 2018.

R. Ounit, S. Wanamaker, T. J. Close, and S. Lonardi, CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers, BMC Genomics, vol.16, issue.1, p.236, 2015.

F. P. Breitwieser, D. N. Baker, and S. L. Salzberg, KrakenUniq: confident and fast metagenomics classification using unique k-mer counts, Genome Biology, vol.19, issue.1, 2018.

A. Thomas, S. Barriere, L. Broseus, J. Brooke, C. Lorenzi et al., GECKO is a genetic algorithm to classify and explore high throughput sequencing data, Communications Biology, vol.2, issue.1, p.222, 2019.
URL : https://hal.archives-ouvertes.fr/lirmm-02163400

M. Kokot, M. D?ugosz, and S. Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, vol.33, issue.17, pp.2759-2761, 2017.

G. A. Sacomoto, J. Kielbassa, R. Chikhi, R. Uricaru, P. Antoniou et al., KIS SPLICE: de-novo calling alternative splicing events from RNA-seq data, BMC Bioinformatics, vol.13, issue.S6, p.5, 2012.

M. I. Love, W. Huber, and S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biology, vol.15, issue.12, p.550, 2014.

M. D. Robinson, D. J. Mccarthy, and G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, vol.26, issue.1, pp.139-140, 2009.

M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law et al., limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Research, vol.43, issue.7, pp.e47-e47, 2015.

T. Sterne-weiler, R. J. Weatheritt, A. J. Best, K. C. Ha, and B. J. Blencowe, Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop, Molecular Cell, vol.72, issue.1, pp.187-200.e6, 2018.

A. Rahman, I. Hallgrímsdóttir, M. Eisen, and L. Pachter, Author response: Association mapping from sequencing reads using k-mers, eLife, vol.7, p.32920, 2018.

A. Drouin, S. Giguère, M. Déraspe, M. Marchand, M. Tyers et al., Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons, BMC Genomics, vol.17, issue.1, p.754, 2016.

T. Hastie and T. The, Elements of statistical learning second edition, Math Intell, vol.27, pp.83-88, 2017.

L. Breiman, Figure 3: Curves of out-of-bag errors rates and estimation of variable importance.

R. R. Bastien, Á. Rodríguez-lescure, M. T. Ebbert, A. Prat, B. Munárriz et al., PAM50 Breast Cancer Subtyping by RT-qPCR and Concordance with Standard Clinical Molecular Markers, BMC Medical Genomics, vol.5, issue.1, p.44, 2012.

K. A. Hoadley, Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, vol.173, pp.291-304, 2018.

E. Jeannot, L. Darrigues, M. Michel, M. Stern, J. Pierga et al., A single droplet digital PCR for ESR1 activating mutations detection in plasma, Oncogene, vol.39, issue.14, pp.2987-2995, 2020.

G. Ciriello, Comprehensive molecular portraits of invasive lobular breast cancer, Cell, vol.163, pp.506-525, 2015.

B. Han, N. Bhowmick, Y. Qu, S. Chung, A. E. Giuliano et al., FOXC1: an emerging marker and therapeutic target for cancer, Oncogene, vol.36, issue.28, pp.3957-3963, 2017.

Y. Yang, D. Li, N. Shen, X. Yu, J. Li et al., TPX2 promotes migration and invasion of human breast cancer cells, Asian Pacific Journal of Tropical Medicine, vol.8, issue.12, pp.1064-1070, 2015.

A. Thakkar, H. Raj, . Ravishankar, B. Muthuvelan, A. Balakrishnan et al., High Expression of Three-Gene Signature Improves Prediction of Relapse-Free Survival in Estrogen Receptor-Positive and Node-Positive Breast Tumors, Biomarker Insights, vol.10, p.BMI.S30559, 2015.

S. S. Bjørklund, A. Panda, S. Kumar, M. Seiler, D. Robinson et al., Widespread alternative exon usage in clinically distinct subtypes of Invasive Ductal Carcinoma, Scientific Reports, vol.7, issue.1, p.5568, 2017.

D. W. Huang, B. T. Sherman, and R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nature Protocols, vol.4, issue.1, pp.44-57, 2008.

, Integrated genomic analyses of ovarian carcinoma, Nature, vol.474, issue.7353, pp.609-615, 2011.

V. M. Villalobos, Y. C. Wang, and B. I. Sikic, Reannotation and Analysis of Clinical and Chemotherapy Outcomes in the Ovarian Data Set From The Cancer Genome Atlas, JCO Clinical Cancer Informatics, vol.2, issue.2, pp.1-16, 2018.

M. P. Goetz, K. R. Kalari, V. J. Suman, A. M. Moyer, J. Yu et al., Tumor Sequencing and Patient-Derived Xenografts in the Neoadjuvant Treatment of Breast Cancer, JNCI: Journal of the National Cancer Institute, vol.109, issue.7, p.306, 2017.

H. Yi, A. T. Raman, H. Zhang, G. I. Allen, and Z. Liu, Detecting hidden batch factors through data-adaptive adjustment for biological effects, Bioinformatics, vol.34, issue.7, pp.1141-1147, 2017.

R. Middleton, D. Gao, A. Thomas, B. Singh, A. Au et al., IRFinder: assessing the impact of intron retention on mammalian gene expression, Genome Biology, vol.18, issue.1, p.51, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01497240

X. Shi and X. Sun, Regulation of paclitaxel activity by microtubule-associated proteins in cancer chemotherapy, Cancer Chemotherapy and Pharmacology, vol.80, issue.5, pp.909-917, 2017.

V. A. Buljan, M. B. Graeber, R. M. Holsinger, D. Brown, B. D. Hambly et al., Calcium?axonemal microtubuli interactions underlie mechanism(s) of primary cilia morphological changes, Journal of Biological Physics, vol.44, issue.1, pp.53-80, 2017.

L. Fornecker, L. Muller, F. Bertrand, N. Paul, A. Pichot et al., Multi-omics dataset to decipher the complexity of drug resistance in diffuse large B-cell lymphoma, Scientific Reports, vol.9, issue.1, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02275054

N. K. Agarwal, C. Qu, K. Kunkulla, Y. Liu, and F. Vega, Transcriptional Regulation of Serine/Threonine Protein Kinase (AKT) Genes by Glioma-associated Oncogene Homolog 1, Journal of Biological Chemistry, vol.288, issue.21, pp.15390-15401, 2013.

C. Zhu, G. Chen, Y. Zhao, X. Gao, and J. Wang, Regulation of the Development and Function of B Cells by ZBTB Transcription Factors, Frontiers in Immunology, vol.9, 2018.

, Supplemental Information 19: SRA accessions of the data at NCBI., ncbi/sra-tools. (NCBI -National Center for Biotechnology Information/NLM/NIH, 2020

H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., The Sequence Alignment/Map format and SAMtools, Bioinformatics, vol.25, issue.16, pp.2078-2079, 2009.

W. Shen, S. Le, Y. Li, and F. Hu, SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation, PLOS ONE, vol.11, issue.10, p.e0163962, 2016.

, Supplementary file 9. Raw FastQC quality control files for sequencing data

G. Park, H. Hwang, P. Nicodème, and W. Szpankowski, Profiles of Tries, SIAM Journal on Computing, vol.38, issue.5, pp.1821-1880, 2009.

L. Dagum and R. Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998.

R. R.-curtin, M. Edel, M. Lozhnikov, Y. Mentekidis, S. Ghaisas et al., mlpack 3: a fast, flexible machine learning library, Journal of Open Source Software, vol.3, issue.26, p.726, 2018.

W. Dubitzky, M. Granzow, and D. P. Berrar, Fundamentals of Data Mining in Genomics and Proteomics, 2007.

C. E. Shannon, A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, vol.5, issue.1, pp.3-55, 2001.

C. Sanderson and R. Curtin, Armadillo: a template-based C++ library for linear algebra, The Journal of Open Source Software, vol.1, issue.2, p.26, 2016.

J. D. Birdwell, A Netlib library for the systems and controls community, IEEE Symposium on Computer-Aided Control System Design

, Lightweight C++ command line option parser, vol.1, 2020.

W. S. Van-der, S. C. Colbert, and G. Varoquaux, The NumPy array: a structure for efficient numerical computation, Comput. Sci. Eng, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00564007

W. Mckinney, Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 2010.

F. Pedregosa, Getting Started with Scikit-learn for Machine Learning, Python® Machine Learning, vol.12, pp.93-117, 2019.

F. Comitani and . Simpsom, , 2019.

G. M. Kurtzer, V. Sochat, and M. W. Bauer, Singularity: Scientific containers for mobility of compute, PLOS ONE, vol.12, issue.5, p.e0177459, 2017.

R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, and C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression, Nature Methods, vol.14, issue.4, pp.417-419, 2017.

C. R. Williams, A. Baccarella, J. Z. Parrish, and C. C. Kim, Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq, BMC Bioinformatics, vol.18, issue.1, p.38, 2017.

A. Athar, A. Füllgrabe, N. George, H. Iqbal, L. Huerta et al., ArrayExpress update ? from bulk to single-cell expression data, Nucleic Acids Research, vol.47, issue.D1, pp.D711-D715, 2018.

C. Lorenzi, S. Barriere, J. Villemin, L. Dejardin-bretones, A. Mancheron et al., iMOKA: k-mer based software to analyze large collections of sequencing data, Genome Biology, vol.21, issue.1, 2020.
URL : https://hal.archives-ouvertes.fr/lirmm-02987774

C. Lorenzi, iMOKA: k-mer based software to analyze large collections of sequencing data, 2020.
URL : https://hal.archives-ouvertes.fr/lirmm-02987774

, Note from the publisher

, Jurisdictional immunity of foreign States with regard to claims relating to infringements of obligations under peremptory norms