You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bing Hua <bh...@cornell.edu> on 2014/01/22 23:59:07 UTC

Solr/Lucene Faceted Search Too Many Unique Values?

Hi,

I am going to evaluate some Lucene/Solr capabilities on handling faceted
queries, in particular, with a single facet field that contains large number
(say up to 1 million) of distinct values. Does anyone have some experience
on how lucene performs in this scenario?

e.g. 
Doc1 has tags A B C D ....
Doc2 has tags B C D E ....
etc etc millions of docs and there can be millions of distinct tag values.

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr/Lucene Faceted Search Too Many Unique Values?

Posted by Yago Riveiro <ya...@gmail.com>.
Im my case I need to know the number os unique number os visitors and number of visits in a period of time.

I need to render the data in a table with pagination. To know the number of unique elements to calculate the total os pages the only way I found was return facets=-1.




/yago





—
/Yago Riveiro

On Thu, Jan 23, 2014 at 1:39 AM, Erick Erickson <er...@gmail.com>
wrote:

> A legitimate question that only you can answer is
> "what's the value of faceting on fields with so many unique values?"
> Consider the ridiculous case of faceting on <uniqueKey>. There's
> almost exactly zero value in faceting on it, since all counts will be 1.
> By analogy, with millions of tag values, will there ever be more than a very
> small count of for any facet? And will showing those be useful to the
> user?
> They may be, and Yago has a use-case where the answer is "yes". Before
> trying to make Solr perform in this insance, though, I'd review the use-case
> to see if it makes sense....
> Erick
> On Wed, Jan 22, 2014 at 5:09 PM, Yago Riveiro <ya...@gmail.com> wrote:
>> You will need to use DocValues if you want to use facets with this amount of terms and not blow the heap.
>>
>> I have facets with ~39M of unique terms, the response time is about 10 ~ 40 seconds, in my case is not a problem.
>>
>> --
>> Yago Riveiro
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>
>>
>> On Wednesday, January 22, 2014 at 10:59 PM, Bing Hua wrote:
>>
>>> Hi,
>>>
>>> I am going to evaluate some Lucene/Solr capabilities on handling faceted
>>> queries, in particular, with a single facet field that contains large number
>>> (say up to 1 million) of distinct values. Does anyone have some experience
>>> on how lucene performs in this scenario?
>>>
>>> e.g.
>>> Doc1 has tags A B C D ....
>>> Doc2 has tags B C D E ....
>>> etc etc millions of docs and there can be millions of distinct tag values.
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
>>> Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
>>>
>>>
>>
>>

Re: Solr/Lucene Faceted Search Too Many Unique Values?

Posted by Erick Erickson <er...@gmail.com>.
A legitimate question that only you can answer is
"what's the value of faceting on fields with so many unique values?"

Consider the ridiculous case of faceting on <uniqueKey>. There's
almost exactly zero value in faceting on it, since all counts will be 1.

By analogy, with millions of tag values, will there ever be more than a very
small count of for any facet? And will showing those be useful to the
user?

They may be, and Yago has a use-case where the answer is "yes". Before
trying to make Solr perform in this insance, though, I'd review the use-case
to see if it makes sense....

Erick

On Wed, Jan 22, 2014 at 5:09 PM, Yago Riveiro <ya...@gmail.com> wrote:
> You will need to use DocValues if you want to use facets with this amount of terms and not blow the heap.
>
> I have facets with ~39M of unique terms, the response time is about 10 ~ 40 seconds, in my case is not a problem.
>
> --
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Wednesday, January 22, 2014 at 10:59 PM, Bing Hua wrote:
>
>> Hi,
>>
>> I am going to evaluate some Lucene/Solr capabilities on handling faceted
>> queries, in particular, with a single facet field that contains large number
>> (say up to 1 million) of distinct values. Does anyone have some experience
>> on how lucene performs in this scenario?
>>
>> e.g.
>> Doc1 has tags A B C D ....
>> Doc2 has tags B C D E ....
>> etc etc millions of docs and there can be millions of distinct tag values.
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
>> Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
>>
>>
>
>

Re: Solr/Lucene Faceted Search Too Many Unique Values?

Posted by Yago Riveiro <ya...@gmail.com>.
You will need to use DocValues if you want to use facets with this amount of terms and not blow the heap.

I have facets with ~39M of unique terms, the response time is about 10 ~ 40 seconds, in my case is not a problem.  

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, January 22, 2014 at 10:59 PM, Bing Hua wrote:

> Hi,
> 
> I am going to evaluate some Lucene/Solr capabilities on handling faceted
> queries, in particular, with a single facet field that contains large number
> (say up to 1 million) of distinct values. Does anyone have some experience
> on how lucene performs in this scenario?
> 
> e.g. 
> Doc1 has tags A B C D ....
> Doc2 has tags B C D E ....
> etc etc millions of docs and there can be millions of distinct tag values.
> 
> Thanks
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
> Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
> 
> 



Re: Solr/Lucene Faceted Search Too Many Unique Values?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2014-01-22 at 23:59 +0100, Bing Hua wrote:
> I am going to evaluate some Lucene/Solr capabilities on handling faceted
> queries, in particular, with a single facet field that contains large number
> (say up to 1 million) of distinct values. Does anyone have some experience
> on how lucene performs in this scenario?

We facet on Author (11.5M unique values) and Subject (3.8M unique
values) on our 12M documents. Each individual document typically has a
low amount of authors and subjects. Two indexes of about 50GB each, 3GB
heap, 5GB RAM free for disk cache, SSD, 4 core Intel Xeon L5420@2.50GHz.

Response time is around 1-200 ms for most queries, some queries taking
1-2 seconds and 1-2% of queries taking 3-10 seconds.

We use a home-grown faceting system under Lucene, but previous tests
shows performance and memory requirements to be quite similar to Solr
faceting, as they use the same algorithm (assuming facet.method=fc).
I do not know how our performance is compared to Lucene faceting.


The dreaded "Too Many Unique Values" is not a performance problem, but a
hard limit on the number of unique values imposed by Solr fc-faceting.
16M, as far as I remember. I do not know if Lucene faceting has the same
limit.

- Toke Eskildsen, State and University Library, Denmark