To Phrase or Not to Phrase - Impact of User versus System Term Dependence Upon Retrieval.
When submitting queries to information retrieval (IR) systems, users oftenhave the option of specifying which, if any, of the query terms are heavilydependent on each other and should be treated as a fixed phrase, for instanceby placing them between quotes. In addition to such cases where users specifyterm dependence, automatic ways also exist for IR systems to detect dependentterms in queries. Most IR systems use both user and algorithmic approaches. Itis not however clear whether and to what extent user-defined term dependenceagrees with algorithmic estimates of term dependence, nor which of the two mayfetch higher performance gains. Simply put, is it better to trust users or thesystem to detect term dependence in queries? To answer this question, weexperiment with 101 crowdsourced search engine users and 334 queries (52 trainand 282 test TREC queries) and we record 10 assessments per query. We find that(i) user assessments of term dependence differ significantly from algorithmicassessments of term dependence (their overlap is approximately 30%); (ii) thereis little agreement among users about term dependence in queries, and thisdisagreement increases as queries become longer; (iii) the potential retrievalgain that can be fetched by treating term dependence (both user- andsystem-defined) over a bag of words baseline is reserved to a small subset(approxi-mately 8%) of the queries, and is much higher for low-depth than deeppreci-sion measures. Points (ii) and (iii) constitute novel insights into termdependence.
Continue reading and listening
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.