10 Jul

Natural Language Processing and Information Retrieval

PubDate(1999), PubPlace() Author(Voorhees)
keyword(NLP,IR)

Summary

NLP doesn’t help ad-hoc retrieval.

Content

Background

Contribution

  • Term-normalization may help
    • Term-mismatch is major performance degrading factor.
    • No evidence that ‘meaning’ representation can boost perf.
  • Queries are troublesome for NLP-IR.
  • With sufficient context, BoW model implicitly disambiguates query-term.
    • e.g. Using document similarity measure to build word sense classifier (Word occurrences in similar documents can be considered to have the same meaning)

Experiment

  • Will ‘concept indexing’ based on WordNet synset improve effectiveness?
    • Using extended VSM : linear interpolation of similarity btw. the query and several vectors of document representation
    • Disambiguation : nouns are disambiguated into a synset id
    • 3 subvectors : 1) words not disambiguated 2) synset ids of disambiguated nouns 3) stems of disambiguated nouns
  • Concept indexing hurt performance in most cases
    • More of term mismatches due to errors in word sense disambiguation
    • Some queries were helped
  • ‘101’ run where non-stemmed verbs and adjectives are not counted as match showed worse perf. than baseline — impact of term mismatch again

Future Work

Comment

Reference

Tags : Paper,IR Print Comments Trackback