You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Amir Hossein Jadidinejad <am...@yahoo.com> on 2008/11/28 15:45:51 UTC

controlled indexing with Lucene

Hi, 
I'm going to index some documents only with known phrases. Let me describe:  
Suppose that I have a controlled vocabulary of phrases (A list of some candidate phrases). I intend to
index ONLY these phrases within my documents and have a retrieval model
(for example simple VS-TF.IDF) that each index item is a predefined
phrase.  
Could anyone tell me how to do it with Lucene?   
Greatly appreciate any comment or answer.
Kind regards.


      

Re: controlled indexing with Lucene

Posted by Erick Erickson <er...@gmail.com>.
I'm not entirely clear what you're trying to accomplish, but there's
a bunch of options.

You could just index the phrases as normal text, possibly with
positionincrementgaps (assuming more than one phrase is
possible in a field in a document). Then you could use SpanQuerys
to accomplish the "simple" form (choosing suitable increment
gaps and suitable distances on your span queries would insure that
hits wouldn't span phrases). You could then use regular queries
to accomplish the non-simple queries as they're pretty much unaffected
by increment gaps.

If that's total gibberish, perhaps a more complete example would allow
a clearer response.

Best
Erick

On Fri, Nov 28, 2008 at 9:45 AM, Amir Hossein Jadidinejad <
amir.jadidi@yahoo.com> wrote:

> Hi,
> I'm going to index some documents only with known phrases. Let me describe:
> Suppose that I have a controlled vocabulary of phrases (A list of some
> candidate phrases). I intend to
> index ONLY these phrases within my documents and have a retrieval model
> (for example simple VS-TF.IDF) that each index item is a predefined
> phrase.
> Could anyone tell me how to do it with Lucene?
> Greatly appreciate any comment or answer.
> Kind regards.
>
>
>