You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by zzzzz shalev <zz...@yahoo.com> on 2006/01/29 13:55:51 UTC
grouping results by fields
hey,
i have a bit of a complex problem,
i need to group results recieved in a result set,
for example:
my result set returns 10,000 results
there are about 10 fields in each result document
i need to group the most frequent values appearing in each field.
if 1 of my fields is named color: then i want the 20 most frequent colors in the 10,000 results,,,
blue - occurs 1302 times
red - occurs 200 times
and so on....
any ideas will be unbelievably appreciated,
thanx
---------------------------------
What are the most popular cars? Find out at Yahoo! Autos
Re: grouping results by fields
Posted by zzzzz shalev <zz...@yahoo.com>.
hey Jim,
thanks alot for the quick reply! much appreciated
i will look a little closer into what is done in C|Net , seems more cost efficient than what im currently doing ;)
however i am not sure how scaleable the solution is
if , for example, i recieved 20,000 results and i have 2000 different colors (a possible scenario in my case)
Jim Powers <ra...@mindspring.com> wrote:
We're doing something very similar. Recently C|Net started using Lucene and
there is a blog entry about how they implemented a "category" scheme that
basically does what you want.
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420
The idea is to generate a bunch of bitsets where the ordinal bit number is
based on the document number in the index.
First result set: perform your search. Then generate a bitset called FRSBS
(full result set bitset).
Next: foreach "word" you want to count generate a bitset (you do this by
traversing the index directly for each word). Call these bitsets WBS(n)
(word bit set n. n spans 1..m where m is the total number of words you want
to count against)
Finally: to get a count per word bit-wise AND each WBS(n) with FRSBS and count
up the 1s
Jim Powers
On Sunday 29 January 2006 07:55, zzzzz shalev wrote:
> hey,
>
> i have a bit of a complex problem,
> i need to group results recieved in a result set,
> for example:
>
> my result set returns 10,000 results
>
> there are about 10 fields in each result document
>
> i need to group the most frequent values appearing in each field.
>
> if 1 of my fields is named color: then i want the 20 most frequent colors
> in the 10,000 results,,,
>
> blue - occurs 1302 times
> red - occurs 200 times
>
> and so on....
>
> any ideas will be unbelievably appreciated,
>
> thanx
>
>
> ---------------------------------
>
> What are the most popular cars? Find out at Yahoo! Autos
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------
Do you Yahoo!?
With a free 1 GB, there's more in store with Yahoo! Mail.
Re: grouping results by fields
Posted by Jim Powers <ra...@mindspring.com>.
We're doing something very similar. Recently C|Net started using Lucene and
there is a blog entry about how they implemented a "category" scheme that
basically does what you want.
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420
The idea is to generate a bunch of bitsets where the ordinal bit number is
based on the document number in the index.
First result set: perform your search. Then generate a bitset called FRSBS
(full result set bitset).
Next: foreach "word" you want to count generate a bitset (you do this by
traversing the index directly for each word). Call these bitsets WBS(n)
(word bit set n. n spans 1..m where m is the total number of words you want
to count against)
Finally: to get a count per word bit-wise AND each WBS(n) with FRSBS and count
up the 1s
Jim Powers
On Sunday 29 January 2006 07:55, zzzzz shalev wrote:
> hey,
>
> i have a bit of a complex problem,
> i need to group results recieved in a result set,
> for example:
>
> my result set returns 10,000 results
>
> there are about 10 fields in each result document
>
> i need to group the most frequent values appearing in each field.
>
> if 1 of my fields is named color: then i want the 20 most frequent colors
> in the 10,000 results,,,
>
> blue - occurs 1302 times
> red - occurs 200 times
>
> and so on....
>
> any ideas will be unbelievably appreciated,
>
> thanx
>
>
> ---------------------------------
>
> What are the most popular cars? Find out at Yahoo! Autos
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org