10 Jul
Natural Language Processing and Information Retrieval
PubDate(1999), PubPlace() Author(Voorhees)
keyword(NLP,IR)
Summary
NLP doesn’t help ad-hoc retrieval.
Content
Background
Contribution
- Term-normalization may help
- Term-mismatch is major performance degrading factor.
- No evidence that ‘meaning’ representation can boost perf.
- Queries are troublesome for NLP-IR.
- With sufficient context, BoW model implicitly disambiguates query-term.
- e.g. Using document similarity measure to build word sense classifier (Word occurrences in similar documents can be considered to have the same meaning)
Experiment
- Will ‘concept indexing’ based on WordNet synset improve effectiveness?
- Using extended VSM : linear interpolation of similarity btw. the query and several vectors of document representation
- Disambiguation : nouns are disambiguated into a synset id
- 3 subvectors : 1) words not disambiguated 2) synset ids of disambiguated nouns 3) stems of disambiguated nouns
- Concept indexing hurt performance in most cases
- More of term mismatches due to errors in word sense disambiguation
- Some queries were helped
- ‘101’ run where non-stemmed verbs and adjectives are not counted as match showed worse perf. than baseline — impact of term mismatch again