You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Matt Ronge <mr...@theronge.com> on 2008/02/06 23:10:56 UTC

Faceting with payloads

Hi all,

I'm using the new payloads feature to assign types to tokens as I  
index. The type is based on the surrounding text in the document, and  
I want to filter my searches based on this token type.

For example, I may index the token "house" maybe found in different  
places with different types. If the user query contains house, I want  
to report the number of instances of the token house of type A, type B  
and so on.

Should I be using payloads for this? If so, I'd like to be able to  
count up all the instances of for each type. Then I can show the  
results, along with TypeA (100 hits), TypeB (1000 hits) so on.

If I could use something like HitCollector that was passed in the  
token payload, that would be perfect, but it doesn't support that. Any  
thoughts on how to go about this? Also, if I want to only allow tokens  
of TypeB, how can I efficiently filter by TypeB; using a Similarity  
subclass seems like a hack.

Thanks in advance,
--
Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceting with payloads

Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 8, 2008, at 12:17 PM, Karl Wettin wrote:

>
> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>
>> I may index the token "house" maybe found in different places with  
>> different types. If the user query contains house, I want to report  
>> the number of instances of the token house of type A, type B and so  
>> on.
>>
>> Should I be using payloads for this? If so, I'd like to be able to  
>> count up all the instances of for each type. Then I can show the  
>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>
> Pehaps, what do you do with these numbers you extract?
>
>> If I could use something like HitCollector that was passed in the  
>> token payload, that would be perfect, but it doesn't support that.  
>> Any thoughts on how to go about this? Also, if I want to only allow  
>> tokens of TypeB, how can I efficiently filter by TypeB; using a  
>> Similarity subclass seems like a hack.
>
> I think you will have to hack the Query classes you want to use, and  
> make their Weight share some counter thingy you can inspect after  
> invoking a Query in the Searcher. But I'm not sure.

Yeah, I agree.  Sounds like a custom Query class.  Have a look at the  
BoostingTermQuery for an example of using payloads.  Not sure yet if  
there is a generalized way to do what you want, but it does seem  
useful.  Seems almost like an expansion of Spans, in that you not only  
want the hits, but you want their positions, and then expanding it, to  
give info about payload type.

I do have a JIRA issue open about adding Payloads access to spans, but  
I haven't implemented anything I am happy with yet.  With that, you  
could iterate the spanPayloads and do your counting, too.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceting with payloads

Posted by Karl Wettin <ka...@gmail.com>.
9 feb 2008 kl. 00.53 skrev Matt Ronge:

>
> On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote:
>
>>
>> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>>
>>> I may index the token "house" maybe found in different places with  
>>> different types. If the user query contains house, I want to  
>>> report the number of instances of the token house of type A, type  
>>> B and so on.
>>>
>>> Should I be using payloads for this? If so, I'd like to be able to  
>>> count up all the instances of for each type. Then I can show the  
>>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>>
>> Pehaps, what do you do with these numbers you extract?
>
> I would like to display this to the user along with the search  
> results. So then can see that there are 100 hits for TypeA, and then  
> can specify to get results just of TypeA.

100 hits, does that explicitly mean 100 documents or could it be 4  
documents with 25 payloads each?

> So on top of being able to count hits based on the payload, I'll  
> need to run a query that looks at the payloads.


It is "not possible" to implement a query that search for payloads.

Are you aware of how facets usually are implemented with Lucene? There  
is a lot about it in the mail archives and Solr does it out of the box.

You probably want a second field indexed with your type  
classifications, and perhaps one field per type containing something  
like size for sort/search/store.



    karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceting with payloads

Posted by Matt Ronge <mr...@theronge.com>.
On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote:

>
> 6 feb 2008 kl. 23.10 skrev Matt Ronge:
>
>> I may index the token "house" maybe found in different places with  
>> different types. If the user query contains house, I want to report  
>> the number of instances of the token house of type A, type B and so  
>> on.
>>
>> Should I be using payloads for this? If so, I'd like to be able to  
>> count up all the instances of for each type. Then I can show the  
>> results, along with TypeA (100 hits), TypeB (1000 hits) so on.
>
> Pehaps, what do you do with these numbers you extract?

I would like to display this to the user along with the search  
results. So then can see that there are 100 hits for TypeA, and then  
can specify to get results just of TypeA. So on top of being able to  
count hits based on the payload, I'll need to run a query that looks  
at the payloads.

--
Matt Ronge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceting with payloads

Posted by Karl Wettin <ka...@gmail.com>.
6 feb 2008 kl. 23.10 skrev Matt Ronge:

> I may index the token "house" maybe found in different places with  
> different types. If the user query contains house, I want to report  
> the number of instances of the token house of type A, type B and so  
> on.
>
> Should I be using payloads for this? If so, I'd like to be able to  
> count up all the instances of for each type. Then I can show the  
> results, along with TypeA (100 hits), TypeB (1000 hits) so on.

Pehaps, what do you do with these numbers you extract?

> If I could use something like HitCollector that was passed in the  
> token payload, that would be perfect, but it doesn't support that.  
> Any thoughts on how to go about this? Also, if I want to only allow  
> tokens of TypeB, how can I efficiently filter by TypeB; using a  
> Similarity subclass seems like a hack.

I think you will have to hack the Query classes you want to use, and  
make their Weight share some counter thingy you can inspect after  
invoking a Query in the Searcher. But I'm not sure.



   karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org