You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christopher Condit <co...@sdsc.edu> on 2010/09/21 21:11:25 UTC

Best practice for embedding extra information in an index

I'm curious about embedding extra information in an index (and being able to search the extra information as well). In this case certain tokens correspond to recognized entities with ids. I'd like to get the ids into the index so that searching for the id of the entity will also return that document. I can think of three ways and I was curious if there's a preferred way:
1) Add the id as another token during filtering
2) Add the id as a payload
3) Add the id as an attribute (although I don't know how to search on the attribute value)

Thanks,
-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best practice for embedding extra information in an index

Posted by Erick Erickson <er...@gmail.com>.
Off the top of my head...
1) is certainly easiest. This looks suspiciously like synonyms. That is, at
index
    time you inject the ID as a synonym in the text and it gets indexed at
the same
    position as the token. Why this helps is that then phrase queries
continue to
    work. Lucene in Action has an example of creating a synonym analyzer.
2) I don't see how payloads really help you here. I confess I'm not
intimately
    familiar with payloads, but what I've seen is that they're useful when
you
    match the *term* and want to do something special. Uses I've seen are,
    for instance, parts of speech. So one can alter the score of, say, nouns
    to boost matches on nouns. But I don't recall seeing something that
allows
    the payload data to be the match.
3)  I have no idea what an attribute is in this context <G>..... Although
you
    could simply create another field that contained all of the IDs for the
    document and add an SHOULD clause to all your queries on that field.

HTH
Erick

On Tue, Sep 21, 2010 at 3:11 PM, Christopher Condit <co...@sdsc.edu> wrote:

> I'm curious about embedding extra information in an index (and being able
> to search the extra information as well). In this case certain tokens
> correspond to recognized entities with ids. I'd like to get the ids into the
> index so that searching for the id of the entity will also return that
> document. I can think of three ways and I was curious if there's a preferred
> way:
> 1) Add the id as another token during filtering
> 2) Add the id as a payload
> 3) Add the id as an attribute (although I don't know how to search on the
> attribute value)
>
> Thanks,
> -Chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>