You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitri Bichko <db...@gmail.com> on 2009/05/22 06:28:59 UTC

Retrieving payloads for terms matched by a query

Hi,

I may be missing something obvious, but how do I get the payloads for
the specific token positions that were matched by a query?

For example, if I have a phrase query like "A keyword B" that matches
the field "A keyword B A", I can get the payloads for A and B with
IndexReader.termPositions(), but that doesn't tell me which of the two
positions of A matched the query.  I can kind of work back to it, but
it quickly becomes difficult for more complex queries.

Here's what I'm trying to accomplish: I have several entity classes,
I'd want to create an index using the class names as tokens, and
storing the specific entity ids in the payloads.  That way, I should
be able to run queries in terms of the classes (ie '"CLASS_A
CLASS_B"~10 NOT CLASS_C', etc) and then report the actual entities for
the matching documents. (I realize I'll need custom tokenizers/filters
to identify and tag the entities and handle class references in
queries, but that part seems pretty straightforward).

Does this sound workable?

Thanks,
Dmitri

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Retrieving payloads for terms matched by a query

Posted by Grant Ingersoll <gs...@apache.org>.
On May 22, 2009, at 12:28 AM, Dmitri Bichko wrote:

> Hi,
>
> I may be missing something obvious, but how do I get the payloads for
> the specific token positions that were matched by a query?

See SpanQuery.getPayloadSpans() and it's SpanQuery derivatives.

>
>
> For example, if I have a phrase query like "A keyword B" that matches
> the field "A keyword B A", I can get the payloads for A and B with
> IndexReader.termPositions(), but that doesn't tell me which of the two
> positions of A matched the query.  I can kind of work back to it, but
> it quickly becomes difficult for more complex queries.
>
> Here's what I'm trying to accomplish: I have several entity classes,
> I'd want to create an index using the class names as tokens, and
> storing the specific entity ids in the payloads.  That way, I should
> be able to run queries in terms of the classes (ie '"CLASS_A
> CLASS_B"~10 NOT CLASS_C', etc) and then report the actual entities for
> the matching documents. (I realize I'll need custom tokenizers/filters
> to identify and tag the entities and handle class references in
> queries, but that part seems pretty straightforward).
>
> Does this sound workable?
>
> Thanks,
> Dmitri
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org