Multiple Instance Learning Based on Mol2vec Molecular Substructure Embeddings for Discovery of NDM-1 Inhibitors

In this paper, we first present a new dataset of NDM-1 biological activities that is compiled by a cleaned version of the NMDI database. A literature review enriched the former database by 741 new compounds, comprising activities against NDM-1 classified in three classes (inactive, weakly and strongly active compounds) by specifying a unifying procedure for the labeling, which covers a range of different activity properties. Second, we restate the classification problem in the Multiple Instance Learning (MIL) setting by representing the compounds as a collection of Mol2vec vectors, each of them corresponding to a specific substructure (either atom or atom including their first neighbors). We observe an amelioration up to 45.7% and 38.47% in respect to balanced accuracy and F1-score, respectively, for the strongly active class in the MIL approach when compared to the classical Machine Learning paradigm. Finally, we present a classification and ranking framework based on classifiers learned by a k-fold CV procedure, which possess different hyper-parameters per fold, learnt by a Bayes optimization procedure. We observe that the top-3 and top-5 ranked accuracies of the strongly active classified compounds yield 100% for the MIL setting.

Mots clés

Machine Leaning Multiple Instance Learning Drug Discovery NDM-1 inhibitors

Domaines

Informatique [cs] Mathématiques [math] Chimie

Fichier principal

PACBB22_Papastergiou_preprint.pdf (257.16 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Thomas Papastergiou : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03859902

Soumis le : vendredi 18 novembre 2022-14:05:58

Dernière modification le : lundi 27 mai 2024-12:07:26

Archivage à long terme le : lundi 20 février 2023-12:09:17

Dates et versions

lirmm-03859902 , version 1 (18-11-2022)

Identifiants

HAL Id : lirmm-03859902 , version 1
DOI : 10.1007/978-3-031-17024-9_6

Citer

Thomas Papastergiou, Jérôme Azé, Sandra Bringay, Maxime Louet, Pascal Poncelet, et al.. Multiple Instance Learning Based on Mol2vec Molecular Substructure Embeddings for Discovery of NDM-1 Inhibitors. PACBB 2022 - 16th International Conference on Practical Applications of Computational Biology and Bioinformatics, Jul 2022, L'Aquila, Italy. pp.55-66, ⟨10.1007/978-3-031-17024-9_6⟩. ⟨lirmm-03859902⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-MONTP3 ADVANSE LIRMM INC-CNRS CHIMIE UNIV-MONTPELLIER AMIS

27 Consultations

164 Téléchargements