You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christopher Condit <co...@sdsc.edu> on 2010/09/21 21:11:25 UTC
Best practice for embedding extra information in an index
I'm curious about embedding extra information in an index (and being able to search the extra information as well). In this case certain tokens correspond to recognized entities with ids. I'd like to get the ids into the index so that searching for the id of the entity will also return that document. I can think of three ways and I was curious if there's a preferred way:
1) Add the id as another token during filtering
2) Add the id as a payload
3) Add the id as an attribute (although I don't know how to search on the attribute value)
Thanks,
-Chris
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best practice for embedding extra information in an index
Posted by Erick Erickson <er...@gmail.com>.
Off the top of my head...
1) is certainly easiest. This looks suspiciously like synonyms. That is, at
index
time you inject the ID as a synonym in the text and it gets indexed at
the same
position as the token. Why this helps is that then phrase queries
continue to
work. Lucene in Action has an example of creating a synonym analyzer.
2) I don't see how payloads really help you here. I confess I'm not
intimately
familiar with payloads, but what I've seen is that they're useful when
you
match the *term* and want to do something special. Uses I've seen are,
for instance, parts of speech. So one can alter the score of, say, nouns
to boost matches on nouns. But I don't recall seeing something that
allows
the payload data to be the match.
3) I have no idea what an attribute is in this context <G>..... Although
you
could simply create another field that contained all of the IDs for the
document and add an SHOULD clause to all your queries on that field.
HTH
Erick
On Tue, Sep 21, 2010 at 3:11 PM, Christopher Condit <co...@sdsc.edu> wrote:
> I'm curious about embedding extra information in an index (and being able
> to search the extra information as well). In this case certain tokens
> correspond to recognized entities with ids. I'd like to get the ids into the
> index so that searching for the id of the entity will also return that
> document. I can think of three ways and I was curious if there's a preferred
> way:
> 1) Add the id as another token during filtering
> 2) Add the id as a payload
> 3) Add the id as an attribute (although I don't know how to search on the
> attribute value)
>
> Thanks,
> -Chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>