On Term Selection Techniques for Patent Prior Art Search

Mona Golestan Far 1, 2 Scott Sanner 3 Mohamed Reda Bouadjenek 4 Gabriela Ferraro 2, 1 David Hawking 1, 5
4 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection , using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular relevance feedback system that extracts terms from the judged relevant documents far out-performs the baseline and performs twice as well on MAP as the best competitor in CLEF-IP 2010. We find a very clear term selection value threshold for use when choosing terms. We also noticed that most of the useful feedback terms are actually present in the original query and hypothesized that the baseline system could be substantially improved by removing negative query terms. We tried four simple automated approaches to identify negative terms for query reduction but we were unable to notably improve on the baseline performance with any of them. However, we show that a simple, minimal interactive relevance feedback approach where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010 suggesting the promise of interactive methods for term selection in patent prior art search.
Type de document :
Communication dans un congrès
SIGIR: Research and Development in Information Retrieval, Aug 2015, Santiago, Chile. ACM, 2015, SIGIR '15: 38th International SIGIR Conference on Research and Development in Information Retrieval. 〈http://www.sigir2015.org〉. 〈10.1145/2766462.2767801〉
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01163055
Contributeur : Mohamed Reda Bouadjenek <>
Soumis le : jeudi 11 juin 2015 - 23:53:13
Dernière modification le : samedi 27 janvier 2018 - 01:31:48
Document(s) archivé(s) le : mardi 25 avril 2017 - 07:13:03

Fichier

sp148-golestan-farA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Mona Golestan Far, Scott Sanner, Mohamed Reda Bouadjenek, Gabriela Ferraro, David Hawking. On Term Selection Techniques for Patent Prior Art Search. SIGIR: Research and Development in Information Retrieval, Aug 2015, Santiago, Chile. ACM, 2015, SIGIR '15: 38th International SIGIR Conference on Research and Development in Information Retrieval. 〈http://www.sigir2015.org〉. 〈10.1145/2766462.2767801〉. 〈lirmm-01163055〉

Partager

Métriques

Consultations de la notice

237

Téléchargements de fichiers

336