You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2002/04/01 01:50:07 UTC

Re: Getting the terms that matched the HitDoc? & Relevance Feedback

>
>
>Subject:
>
>RE: Relevance Feedback
>From:
>
>Doug Cutting <DC...@grandcentral.com>
>Date:
>
>Sat, 30 Mar 2002 08:51:39 -0800
>To:
>
>Lucene Users List <lu...@jakarta.apache.org>
>
>
>Dmitry Serebrennikov [dmitrys@earthlink.net] has implemented a substantial
>extension to Lucene which should help folks doing this sort of research.  It
>provides an explicit vector representation for documents.  This way you can,
>e.g., retrieve a number of documents, efficiently sum their vectors, then
>derive a new query from the sum.  This code was posted to the list a long
>while back, but is now out of date.  As soon as the 1.2 release is final,
>and Dmitry has time, he intends to merge it into Lucene.
>
>Doug
>
Thanks, Doug.

This is true. The code was actually intended, and is being used, for 
something more like what is being discussed on the "Relevance Feedback" 
thread - retrieving lists of terms based on matching documents. Terms as 
well as Term Positions are available from the API given a document. 
There is also a cost for using this API. It adds three or four files per 
index segment, and one of them is as large as the .prx file (provided 
that you choose to "vectorize" every indexed field in the document). 
Another issue with the code is that it does not (yet) support use with 
unoptimized indexes (those with more than one segment).

Dmitry.





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>