You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by André Maldonado <an...@gmail.com> on 2009/11/09 13:43:35 UTC

Category count.

Hy all. I have a problem that is exactly like this (that was wrote from
another developer)

"I am trying to use Lucene Java 2.3.2 to implement search on a catalog of
products. Apart from the regular fields for a product, there is field called
'Category'. A product can fall in multiple categories. Currently, I use
FilteredQuery to search for the same search term with every Category to get
the number of results per category.

This results in 20-30 internal search calls per query to display the
results. This is slowing down the search considerably. Is there a faster way
of achieving the same result using Lucene?"
But in the thread that I found this question, I didn't found any good
solution.

Can you help me?

Thank's

"Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
verdadeiramente o Filho de Deus." (Mateus 14:33)

Re: Category count.

Posted by Matt Honeycutt <mb...@gmail.com>.
A friend and I were just talking about Solr this morning.  He sent me this:
http://code.google.com/p/solrnet/

2009/11/9 André Maldonado <an...@gmail.com>

> Solr has a XML API, correct? So it can be used with .net.
>
> Or I'm wrong?
>
> Thank's
>
> "Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
> verdadeiramente o Filho de Deus." (Mateus 14:33)
>
>
> On Mon, Nov 9, 2009 at 12:14, Erik Hatcher <er...@gmail.com> wrote:
>
> > Note that Solr has faceted built-in, and uses Lucene's goodness too.  And
> > it scales quite well.
> >
> >        Erik
> >
> >
> >
> > On Nov 9, 2009, at 8:12 AM, Moray McConnachie wrote:
> >
> >  This is basically Lucene for faceted search I think?
> >>
> >> Most approaches I have seen to this involve caching results and/or
> >> duplicating the facet information in an alternate data store.
> >>
> >> The best resource I have seen using caching results. It permits you to
> >> drill down into multiple facets and get the no. of documents per facet
> >> updated easily without going back to the Lucene engine multiple queries.
> >>
> >>
> >>
> http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
> >>
> >> 1) at initialisation (and/or at set points) step through all the
> potential
> >> facet values and store the matching results in some kind of cached
> >> dictionary of bit arrays
> >> 2) the user drills down into whatever facets
> >> 3) you AND together the bit arrays representing each facet the user is
> in
> >> 4) You count the number of positive bits in the resulting bit array to
> get
> >> the number of articles matched.
> >>
> >> At 3) you could clearly AND this together with any other Lucene result
> set
> >> to get accurate counts when you are integrating facets and non-faceted
> >> search results.
> >>
> >> The approach works best the higher the ratio of queries to updates - it
> >> will work poorly for applications with any or all of
> >>
> >> a) very frequent updating
> >> b) the need for facets to be 100% accurate in real time
> >> c) a large number of potential facet values (initialisation could be
> very
> >> slow)
> >>
> >> With a little extra work on the indexing end you could conquer a) and b)
> >> and hopefully get round the need to reinitialise from scratch.
> >>
> >> I'm not sure how well it would work with very large datasets either,
> >> particularly where the number of matches in some facet is very large -
> I've
> >> never had to work with bit arrays of millions of bits!
> >>
> >> I like this approach because it is a 100% lucene solution and it is
> >> (relatively) fast compared to your approach so far and other similar
> >> approaches.
> >>
> >> Faceting is such a common meme for search, I can foresee someone porting
> >> faceting functionality into the back end if indeed it is not already
> >> happening?
> >>
> >> Yours,
> >> Moray
> >>
> >>
> >> -------------------------------------
> >> Moray McConnachie
> >> Director of IT    +44 1865 261 600
> >> Oxford Analytica  http://www.oxan.com
> >>
> >> -----Original Message-----
> >> From: André Maldonado [mailto:andre.maldonado@gmail.com]
> >> Sent: 09 November 2009 12:44
> >> To: lucene-net-user@incubator.apache.org
> >> Subject: Category count.
> >>
> >> Hy all. I have a problem that is exactly like this (that was wrote from
> >> another developer)
> >>
> >> "I am trying to use Lucene Java 2.3.2 to implement search on a catalog
> of
> >> products. Apart from the regular fields for a product, there is field
> called
> >> 'Category'. A product can fall in multiple categories. Currently, I use
> >> FilteredQuery to search for the same search term with every Category to
> get
> >> the number of results per category.
> >>
> >> This results in 20-30 internal search calls per query to display the
> >> results. This is slowing down the search considerably. Is there a faster
> way
> >> of achieving the same result using Lucene?"
> >> But in the thread that I found this question, I didn't found any good
> >> solution.
> >>
> >> Can you help me?
> >>
> >> Thank's
> >>
> >> "Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo:
> És
> >> verdadeiramente o Filho de Deus." (Mateus 14:33)
> >>
> >>
> >
>

Re: Category count.

Posted by Erik Hatcher <er...@gmail.com>.
On Nov 9, 2009, at 9:55 AM, André Maldonado wrote:
> Solr has a XML API, correct? So it can be used with .net.

Yes, documents can come in as XML (<doc><field>value</field>...</ 
doc>), or as CSV, or as rich documents (like PDF, Word, etc) over HTTP.

Several clients of ours are using Solr from .NET environments.  One of  
them is even interestingly combining Solr as part of a SQL Server  
query using an extension point.

	Erik


>
> Or I'm wrong?
>
> Thank's
>
> "Então aproximaram-se os que estavam no barco, e adoraram-no,  
> dizendo: És
> verdadeiramente o Filho de Deus." (Mateus 14:33)
>
>
> On Mon, Nov 9, 2009 at 12:14, Erik Hatcher <er...@gmail.com>  
> wrote:
>
>> Note that Solr has faceted built-in, and uses Lucene's goodness  
>> too.  And
>> it scales quite well.
>>
>>       Erik
>>
>>
>>
>> On Nov 9, 2009, at 8:12 AM, Moray McConnachie wrote:
>>
>> This is basically Lucene for faceted search I think?
>>>
>>> Most approaches I have seen to this involve caching results and/or
>>> duplicating the facet information in an alternate data store.
>>>
>>> The best resource I have seen using caching results. It permits  
>>> you to
>>> drill down into multiple facets and get the no. of documents per  
>>> facet
>>> updated easily without going back to the Lucene engine multiple  
>>> queries.
>>>
>>>
>>> http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
>>>
>>> 1) at initialisation (and/or at set points) step through all the  
>>> potential
>>> facet values and store the matching results in some kind of cached
>>> dictionary of bit arrays
>>> 2) the user drills down into whatever facets
>>> 3) you AND together the bit arrays representing each facet the  
>>> user is in
>>> 4) You count the number of positive bits in the resulting bit  
>>> array to get
>>> the number of articles matched.
>>>
>>> At 3) you could clearly AND this together with any other Lucene  
>>> result set
>>> to get accurate counts when you are integrating facets and non- 
>>> faceted
>>> search results.
>>>
>>> The approach works best the higher the ratio of queries to updates  
>>> - it
>>> will work poorly for applications with any or all of
>>>
>>> a) very frequent updating
>>> b) the need for facets to be 100% accurate in real time
>>> c) a large number of potential facet values (initialisation could  
>>> be very
>>> slow)
>>>
>>> With a little extra work on the indexing end you could conquer a)  
>>> and b)
>>> and hopefully get round the need to reinitialise from scratch.
>>>
>>> I'm not sure how well it would work with very large datasets either,
>>> particularly where the number of matches in some facet is very  
>>> large - I've
>>> never had to work with bit arrays of millions of bits!
>>>
>>> I like this approach because it is a 100% lucene solution and it is
>>> (relatively) fast compared to your approach so far and other similar
>>> approaches.
>>>
>>> Faceting is such a common meme for search, I can foresee someone  
>>> porting
>>> faceting functionality into the back end if indeed it is not already
>>> happening?
>>>
>>> Yours,
>>> Moray
>>>
>>>
>>> -------------------------------------
>>> Moray McConnachie
>>> Director of IT    +44 1865 261 600
>>> Oxford Analytica  http://www.oxan.com
>>>
>>> -----Original Message-----
>>> From: André Maldonado [mailto:andre.maldonado@gmail.com]
>>> Sent: 09 November 2009 12:44
>>> To: lucene-net-user@incubator.apache.org
>>> Subject: Category count.
>>>
>>> Hy all. I have a problem that is exactly like this (that was wrote  
>>> from
>>> another developer)
>>>
>>> "I am trying to use Lucene Java 2.3.2 to implement search on a  
>>> catalog of
>>> products. Apart from the regular fields for a product, there is  
>>> field called
>>> 'Category'. A product can fall in multiple categories. Currently,  
>>> I use
>>> FilteredQuery to search for the same search term with every  
>>> Category to get
>>> the number of results per category.
>>>
>>> This results in 20-30 internal search calls per query to display the
>>> results. This is slowing down the search considerably. Is there a  
>>> faster way
>>> of achieving the same result using Lucene?"
>>> But in the thread that I found this question, I didn't found any  
>>> good
>>> solution.
>>>
>>> Can you help me?
>>>
>>> Thank's
>>>
>>> "Então aproximaram-se os que estavam no barco, e adoraram-no,  
>>> dizendo: És
>>> verdadeiramente o Filho de Deus." (Mateus 14:33)
>>>
>>>
>>


Re: Category count.

Posted by André Maldonado <an...@gmail.com>.
Solr has a XML API, correct? So it can be used with .net.

Or I'm wrong?

Thank's

"Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
verdadeiramente o Filho de Deus." (Mateus 14:33)


On Mon, Nov 9, 2009 at 12:14, Erik Hatcher <er...@gmail.com> wrote:

> Note that Solr has faceted built-in, and uses Lucene's goodness too.  And
> it scales quite well.
>
>        Erik
>
>
>
> On Nov 9, 2009, at 8:12 AM, Moray McConnachie wrote:
>
>  This is basically Lucene for faceted search I think?
>>
>> Most approaches I have seen to this involve caching results and/or
>> duplicating the facet information in an alternate data store.
>>
>> The best resource I have seen using caching results. It permits you to
>> drill down into multiple facets and get the no. of documents per facet
>> updated easily without going back to the Lucene engine multiple queries.
>>
>>
>> http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
>>
>> 1) at initialisation (and/or at set points) step through all the potential
>> facet values and store the matching results in some kind of cached
>> dictionary of bit arrays
>> 2) the user drills down into whatever facets
>> 3) you AND together the bit arrays representing each facet the user is in
>> 4) You count the number of positive bits in the resulting bit array to get
>> the number of articles matched.
>>
>> At 3) you could clearly AND this together with any other Lucene result set
>> to get accurate counts when you are integrating facets and non-faceted
>> search results.
>>
>> The approach works best the higher the ratio of queries to updates - it
>> will work poorly for applications with any or all of
>>
>> a) very frequent updating
>> b) the need for facets to be 100% accurate in real time
>> c) a large number of potential facet values (initialisation could be very
>> slow)
>>
>> With a little extra work on the indexing end you could conquer a) and b)
>> and hopefully get round the need to reinitialise from scratch.
>>
>> I'm not sure how well it would work with very large datasets either,
>> particularly where the number of matches in some facet is very large - I've
>> never had to work with bit arrays of millions of bits!
>>
>> I like this approach because it is a 100% lucene solution and it is
>> (relatively) fast compared to your approach so far and other similar
>> approaches.
>>
>> Faceting is such a common meme for search, I can foresee someone porting
>> faceting functionality into the back end if indeed it is not already
>> happening?
>>
>> Yours,
>> Moray
>>
>>
>> -------------------------------------
>> Moray McConnachie
>> Director of IT    +44 1865 261 600
>> Oxford Analytica  http://www.oxan.com
>>
>> -----Original Message-----
>> From: André Maldonado [mailto:andre.maldonado@gmail.com]
>> Sent: 09 November 2009 12:44
>> To: lucene-net-user@incubator.apache.org
>> Subject: Category count.
>>
>> Hy all. I have a problem that is exactly like this (that was wrote from
>> another developer)
>>
>> "I am trying to use Lucene Java 2.3.2 to implement search on a catalog of
>> products. Apart from the regular fields for a product, there is field called
>> 'Category'. A product can fall in multiple categories. Currently, I use
>> FilteredQuery to search for the same search term with every Category to get
>> the number of results per category.
>>
>> This results in 20-30 internal search calls per query to display the
>> results. This is slowing down the search considerably. Is there a faster way
>> of achieving the same result using Lucene?"
>> But in the thread that I found this question, I didn't found any good
>> solution.
>>
>> Can you help me?
>>
>> Thank's
>>
>> "Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
>> verdadeiramente o Filho de Deus." (Mateus 14:33)
>>
>>
>

Re: Category count.

Posted by Erik Hatcher <er...@gmail.com>.
Note that Solr has faceted built-in, and uses Lucene's goodness too.   
And it scales quite well.

	Erik


On Nov 9, 2009, at 8:12 AM, Moray McConnachie wrote:

> This is basically Lucene for faceted search I think?
>
> Most approaches I have seen to this involve caching results and/or  
> duplicating the facet information in an alternate data store.
>
> The best resource I have seen using caching results. It permits you  
> to drill down into multiple facets and get the no. of documents per  
> facet updated easily without going back to the Lucene engine  
> multiple queries.
>
> http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
>
> 1) at initialisation (and/or at set points) step through all the  
> potential facet values and store the matching results in some kind  
> of cached dictionary of bit arrays
> 2) the user drills down into whatever facets
> 3) you AND together the bit arrays representing each facet the user  
> is in
> 4) You count the number of positive bits in the resulting bit array  
> to get the number of articles matched.
>
> At 3) you could clearly AND this together with any other Lucene  
> result set to get accurate counts when you are integrating facets  
> and non-faceted search results.
>
> The approach works best the higher the ratio of queries to updates -  
> it will work poorly for applications with any or all of
>
> a) very frequent updating
> b) the need for facets to be 100% accurate in real time
> c) a large number of potential facet values (initialisation could be  
> very slow)
>
> With a little extra work on the indexing end you could conquer a)  
> and b) and hopefully get round the need to reinitialise from scratch.
>
> I'm not sure how well it would work with very large datasets either,  
> particularly where the number of matches in some facet is very large  
> - I've never had to work with bit arrays of millions of bits!
>
> I like this approach because it is a 100% lucene solution and it is  
> (relatively) fast compared to your approach so far and other similar  
> approaches.
>
> Faceting is such a common meme for search, I can foresee someone  
> porting faceting functionality into the back end if indeed it is not  
> already happening?
>
> Yours,
> Moray
>
>
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
> -----Original Message-----
> From: André Maldonado [mailto:andre.maldonado@gmail.com]
> Sent: 09 November 2009 12:44
> To: lucene-net-user@incubator.apache.org
> Subject: Category count.
>
> Hy all. I have a problem that is exactly like this (that was wrote  
> from another developer)
>
> "I am trying to use Lucene Java 2.3.2 to implement search on a  
> catalog of products. Apart from the regular fields for a product,  
> there is field called 'Category'. A product can fall in multiple  
> categories. Currently, I use FilteredQuery to search for the same  
> search term with every Category to get the number of results per  
> category.
>
> This results in 20-30 internal search calls per query to display the  
> results. This is slowing down the search considerably. Is there a  
> faster way of achieving the same result using Lucene?"
> But in the thread that I found this question, I didn't found any  
> good solution.
>
> Can you help me?
>
> Thank's
>
> "Então aproximaram-se os que estavam no barco, e adoraram-no,  
> dizendo: És verdadeiramente o Filho de Deus." (Mateus 14:33)
>


RE: Category count.

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
This is basically Lucene for faceted search I think?

Most approaches I have seen to this involve caching results and/or duplicating the facet information in an alternate data store.

The best resource I have seen using caching results. It permits you to drill down into multiple facets and get the no. of documents per facet updated easily without going back to the Lucene engine multiple queries.

http://www.devatwork.nl/index.php/articles/lucenenet/faceted-search-and-drill-down-lucenenet/

1) at initialisation (and/or at set points) step through all the potential facet values and store the matching results in some kind of cached dictionary of bit arrays
2) the user drills down into whatever facets
3) you AND together the bit arrays representing each facet the user is in
4) You count the number of positive bits in the resulting bit array to get the number of articles matched.

At 3) you could clearly AND this together with any other Lucene result set to get accurate counts when you are integrating facets and non-faceted search results.

The approach works best the higher the ratio of queries to updates - it will work poorly for applications with any or all of 

a) very frequent updating 
b) the need for facets to be 100% accurate in real time
c) a large number of potential facet values (initialisation could be very slow)

With a little extra work on the indexing end you could conquer a) and b) and hopefully get round the need to reinitialise from scratch.

I'm not sure how well it would work with very large datasets either, particularly where the number of matches in some facet is very large - I've never had to work with bit arrays of millions of bits!

I like this approach because it is a 100% lucene solution and it is (relatively) fast compared to your approach so far and other similar approaches.

Faceting is such a common meme for search, I can foresee someone porting faceting functionality into the back end if indeed it is not already happening?

Yours,
Moray


------------------------------------- 
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: André Maldonado [mailto:andre.maldonado@gmail.com] 
Sent: 09 November 2009 12:44
To: lucene-net-user@incubator.apache.org
Subject: Category count.

Hy all. I have a problem that is exactly like this (that was wrote from another developer)

"I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for the same search term with every Category to get the number of results per category.

This results in 20-30 internal search calls per query to display the results. This is slowing down the search considerably. Is there a faster way of achieving the same result using Lucene?"
But in the thread that I found this question, I didn't found any good solution.

Can you help me?

Thank's

"Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És verdadeiramente o Filho de Deus." (Mateus 14:33)