On Term Selection Techniques for Patent Prior Art Search

Mona Golestan Far; Scott Sanner; Mohamed Reda Bouadjenek; Gabriela Ferraro; David Hawking

doi:10.1145/2766462.2767801

Communication Dans Un Congrès Année : 2015

On Term Selection Techniques for Patent Prior Art Search

(1, 2) , (3) , (4) , (2, 1) , (1, 5)

1
2
3
4
5

Mona Golestan Far

Fonction : Auteur
PersonId : 967038

Australian National University

National ICT Australia [Sydney]

Scott Sanner

Fonction : Auteur
PersonId : 967039

Oregon State University

Mohamed Reda Bouadjenek

Fonction : Auteur
PersonId : 1245926
ORCID : 0000-0003-1807-430X

Scientific Data Management

Gabriela Ferraro

Fonction : Auteur
PersonId : 961736

National ICT Australia [Sydney]

Australian National University

David Hawking

Fonction : Auteur
PersonId : 967040

Australian National University

BING - Microsoft

Résumé

In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection , using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular relevance feedback system that extracts terms from the judged relevant documents far out-performs the baseline and performs twice as well on MAP as the best competitor in CLEF-IP 2010. We find a very clear term selection value threshold for use when choosing terms. We also noticed that most of the useful feedback terms are actually present in the original query and hypothesized that the baseline system could be substantially improved by removing negative query terms. We tried four simple automated approaches to identify negative terms for query reduction but we were unable to notably improve on the baseline performance with any of them. However, we show that a simple, minimal interactive relevance feedback approach where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010 suggesting the promise of interactive methods for term selection in patent prior art search.

Mots clés

Patent Search Query Reformulation

Domaines

Informatique [cs]

Fichier principal

sp148-golestan-farA.pdf (256.51 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mohamed Reda BOUADJENEK : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01163055

Soumis le : jeudi 11 juin 2015-23:53:13

Dernière modification le : vendredi 24 mars 2023-14:53:00

Archivage à long terme le : mardi 25 avril 2017-07:13:03

Dates et versions

lirmm-01163055 , version 1 (11-06-2015)

Identifiants

HAL Id : lirmm-01163055 , version 1
DOI : 10.1145/2766462.2767801

Citer

Mona Golestan Far, Scott Sanner, Mohamed Reda Bouadjenek, Gabriela Ferraro, David Hawking. On Term Selection Techniques for Patent Prior Art Search. SIGIR: Research and Development in Information Retrieval, Aug 2015, Santiago, Chile. ⟨10.1145/2766462.2767801⟩. ⟨lirmm-01163055⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ZENITH LIRMM INRIA2 MIPS UNIV-MONTPELLIER

384 Consultations

409 Téléchargements

On Term Selection Techniques for Patent Prior Art Search

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager