You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Matt Ronge <mr...@theronge.com> on 2008/02/06 23:10:56 UTC
Faceting with payloads
Hi all,
I'm using the new payloads feature to assign types to tokens as I
index. The type is based on the surrounding text in the document, and
I want to filter my searches based on this token type.
For example, I may index the token "house" maybe found in different
places with different types. If the user query contains house, I want
to report the number of instances of the token house of type A, type B
and so on.
Should I be using payloads for this? If so, I'd like to be able to
count up all the instances of for each type. Then I can show the
results, along with TypeA (100 hits), TypeB (1000 hits) so on.
If I could use something like HitCollector that was passed in the
token payload, that would be perfect, but it doesn't support that. Any
thoughts on how to go about this? Also, if I want to only allow tokens
of TypeB, how can I efficiently filter by TypeB; using a Similarity
subclass seems like a hack.
Thanks in advance,
--
Matt
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Faceting with payloads
Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 8, 2008, at 12:17 PM, Karl Wettin wrote:
>
> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>
>> I may index the token "house" maybe found in different places with
>> different types. If the user query contains house, I want to report
>> the number of instances of the token house of type A, type B and so
>> on.
>>
>> Should I be using payloads for this? If so, I'd like to be able to
>> count up all the instances of for each type. Then I can show the
>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>
> Pehaps, what do you do with these numbers you extract?
>
>> If I could use something like HitCollector that was passed in the
>> token payload, that would be perfect, but it doesn't support that.
>> Any thoughts on how to go about this? Also, if I want to only allow
>> tokens of TypeB, how can I efficiently filter by TypeB; using a
>> Similarity subclass seems like a hack.
>
> I think you will have to hack the Query classes you want to use, and
> make their Weight share some counter thingy you can inspect after
> invoking a Query in the Searcher. But I'm not sure.
Yeah, I agree. Sounds like a custom Query class. Have a look at the
BoostingTermQuery for an example of using payloads. Not sure yet if
there is a generalized way to do what you want, but it does seem
useful. Seems almost like an expansion of Spans, in that you not only
want the hits, but you want their positions, and then expanding it, to
give info about payload type.
I do have a JIRA issue open about adding Payloads access to spans, but
I haven't implemented anything I am happy with yet. With that, you
could iterate the spanPayloads and do your counting, too.
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Faceting with payloads
Posted by Karl Wettin <ka...@gmail.com>.
9 feb 2008 kl. 00.53 skrev Matt Ronge:
>
> On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote:
>
>>
>> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>>
>>> I may index the token "house" maybe found in different places with
>>> different types. If the user query contains house, I want to
>>> report the number of instances of the token house of type A, type
>>> B and so on.
>>>
>>> Should I be using payloads for this? If so, I'd like to be able to
>>> count up all the instances of for each type. Then I can show the
>>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>>
>> Pehaps, what do you do with these numbers you extract?
>
> I would like to display this to the user along with the search
> results. So then can see that there are 100 hits for TypeA, and then
> can specify to get results just of TypeA.
100 hits, does that explicitly mean 100 documents or could it be 4
documents with 25 payloads each?
> So on top of being able to count hits based on the payload, I'll
> need to run a query that looks at the payloads.
It is "not possible" to implement a query that search for payloads.
Are you aware of how facets usually are implemented with Lucene? There
is a lot about it in the mail archives and Solr does it out of the box.
You probably want a second field indexed with your type
classifications, and perhaps one field per type containing something
like size for sort/search/store.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Faceting with payloads
Posted by Matt Ronge <mr...@theronge.com>.
On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote:
>
> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>
>> I may index the token "house" maybe found in different places with
>> different types. If the user query contains house, I want to report
>> the number of instances of the token house of type A, type B and so
>> on.
>>
>> Should I be using payloads for this? If so, I'd like to be able to
>> count up all the instances of for each type. Then I can show the
>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>
> Pehaps, what do you do with these numbers you extract?
I would like to display this to the user along with the search
results. So then can see that there are 100 hits for TypeA, and then
can specify to get results just of TypeA. So on top of being able to
count hits based on the payload, I'll need to run a query that looks
at the payloads.
--
Matt Ronge
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Faceting with payloads
Posted by Karl Wettin <ka...@gmail.com>.
6 feb 2008 kl. 23.10 skrev Matt Ronge:
> I may index the token "house" maybe found in different places with
> different types. If the user query contains house, I want to report
> the number of instances of the token house of type A, type B and so
> on.
>
> Should I be using payloads for this? If so, I'd like to be able to
> count up all the instances of for each type. Then I can show the
> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
Pehaps, what do you do with these numbers you extract?
> If I could use something like HitCollector that was passed in the
> token payload, that would be perfect, but it doesn't support that.
> Any thoughts on how to go about this? Also, if I want to only allow
> tokens of TypeB, how can I efficiently filter by TypeB; using a
> Similarity subclass seems like a hack.
I think you will have to hack the Query classes you want to use, and
make their Weight share some counter thingy you can inspect after
invoking a Query in the Searcher. But I'm not sure.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org