You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by lei <si...@gmail.com> on 2015/03/05 20:18:50 UTC

Performance on faceting using docValues

Hi there,

I'm testing facet performance with vs without docValues in Solr 4.7, and
found that on first request, performance with docValues is much faster than
non-docValues. However, for subsequent requests (where the queries are
cached), the performance is slower for docValues than non-docValues. Is
this an expected behavior? Any idea or solution is appreciated. Thanks.

RE: Performance on faceting using docValues

Posted by "Ryan, Michael F. (LNG-DAY)" <mi...@lexisnexis.com>.
This is consistent with my experience. DocValues is faster for the first call (compared to UnInvertedField, which is what is used when there are no DocValues), but is slower on subsequent calls.

I'm curious as to this as well, since I haven't heard anyone else before you also mention this. I thought maybe I was the only one...

-Michael

-----Original Message-----
From: lei [mailto:simplely@gmail.com] 
Sent: Thursday, March 05, 2015 2:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance on faceting using docValues

Here is the specs of some example query faceting on three fields (all string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues) consistently the total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei <si...@gmail.com> wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, 
> and found that on first request, performance with docValues is much 
> faster than non-docValues. However, for subsequent requests (where the 
> queries are cached), the performance is slower for docValues than 
> non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks.
>

Re: Performance on faceting using docValues

Posted by lei <si...@gmail.com>.
Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei <si...@gmail.com> wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, and
> found that on first request, performance with docValues is much faster
> than non-docValues. However, for subsequent requests (where the queries are
> cached), the performance is slower for docValues than non-docValues. Is
> this an expected behavior? Any idea or solution is appreciated. Thanks.
>

Re: Performance on faceting using docValues

Posted by lei <si...@gmail.com>.
The term histograms are shared in this link. Sorry for the confusion.

https://docs.google.com/presentation/d/1tma4hkYjxJfBTnMbO6Pq_dUHqZ0wI_UTlgoVqXtW4ZA/pub?start=false&loop=false&delayms=3000&slide=id.p


> On Mon, Mar 9, 2015 at 10:56 AM, Anshum Gupta <an...@anshumgupta.net>
> wrote:
>
>> Hi Lei,
>>
>> The mailing list doesn't allow attachments. Can you share these via a file
>> sharing platform?
>>
>> On Mon, Mar 9, 2015 at 12:48 AM, lei <si...@gmail.com> wrote:
>>
>> > The Solr instance is single-shard. Index size is around 20G and total
>> doc
>> > # is about 12 million. Below are the histograms for the three facet
>> fields
>> > in my query. Thanks.
>> >
>> >
>> > On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen <te@statsbiblioteket.dk
>> >
>> > wrote:
>> >
>> >> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>> >>
>> >> You present a very interesting observation. I have not noticed what you
>> >> describe, but on the other hand we have not done comparative speed
>> >> tests.
>> >>
>> >> > q=*:*&fq=country:"US"&fq=category:112
>> >>
>> >> First observation: Your query is '*:*, which is a "magic" query. Non-DV
>> >> faceting has optimizations both for this query (although that ought to
>> >> be disabled due to the fq) and for the "inverse" case where there are
>> >> more hits than non-hits. Perhaps you could test with a handful of
>> >> queries, which has different result sizes?
>> >>
>> >> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>> >>
>> >> The combination of index order and a high limit might be an
>> explanation:
>> >> When resolving the Strings of the facet result, non-DV will perform
>> >> ordinal-lookup, which is fast when done in monotonic rising order
>> >> (sort=index) and if the values are close (limit=2000). I do not know if
>> >> DV benefits the same way.
>> >>
>> >> On the other hand, your limit seems to apply only to material, so it
>> >> could be that the real number of unique values is low and you just set
>> >> the limit to 2000 to be sure you get everything?
>> >>
>> >> > &facet.field=manufacturer&facet.field=seller&facet.field=material
>> >> >
>> >>
>> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
>> >> >
>> >>
>> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
>> >> > &f.material.facet.mincount=1&sort=score+desc
>> >>
>> >> How large is your index in bytes, how many documents does it contain
>> and
>> >> is it single-shard or cloud? Could you paste the loglines containing
>> >> "UnInverted field", which describes the number of unique values and
>> size
>> >> of your facet fields?
>> >>
>> >> - Toke Eskildsen, State and University Library, Denmark
>> >>
>> >>
>>
>>
>> --
>> Anshum Gupta
>>
>
>

Re: Performance on faceting using docValues

Posted by lei <si...@gmail.com>.
Sure, here is the link to the image of term histograms. Thanks.

https://docs.google.com/presentation/d/1tma4hkYjxJfBTnMbO6Pq_dUHqZ0wI_UTlgoVqXtW4ZA/edit?usp=sharing

On Mon, Mar 9, 2015 at 10:56 AM, Anshum Gupta <an...@anshumgupta.net>
wrote:

> Hi Lei,
>
> The mailing list doesn't allow attachments. Can you share these via a file
> sharing platform?
>
> On Mon, Mar 9, 2015 at 12:48 AM, lei <si...@gmail.com> wrote:
>
> > The Solr instance is single-shard. Index size is around 20G and total doc
> > # is about 12 million. Below are the histograms for the three facet
> fields
> > in my query. Thanks.
> >
> >
> > On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> > wrote:
> >
> >> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
> >>
> >> You present a very interesting observation. I have not noticed what you
> >> describe, but on the other hand we have not done comparative speed
> >> tests.
> >>
> >> > q=*:*&fq=country:"US"&fq=category:112
> >>
> >> First observation: Your query is '*:*, which is a "magic" query. Non-DV
> >> faceting has optimizations both for this query (although that ought to
> >> be disabled due to the fq) and for the "inverse" case where there are
> >> more hits than non-hits. Perhaps you could test with a handful of
> >> queries, which has different result sizes?
> >>
> >> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
> >>
> >> The combination of index order and a high limit might be an explanation:
> >> When resolving the Strings of the facet result, non-DV will perform
> >> ordinal-lookup, which is fast when done in monotonic rising order
> >> (sort=index) and if the values are close (limit=2000). I do not know if
> >> DV benefits the same way.
> >>
> >> On the other hand, your limit seems to apply only to material, so it
> >> could be that the real number of unique values is low and you just set
> >> the limit to 2000 to be sure you get everything?
> >>
> >> > &facet.field=manufacturer&facet.field=seller&facet.field=material
> >> >
> >>
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> >> >
> >>
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> >> > &f.material.facet.mincount=1&sort=score+desc
> >>
> >> How large is your index in bytes, how many documents does it contain and
> >> is it single-shard or cloud? Could you paste the loglines containing
> >> "UnInverted field", which describes the number of unique values and size
> >> of your facet fields?
> >>
> >> - Toke Eskildsen, State and University Library, Denmark
> >>
> >>
>
>
> --
> Anshum Gupta
>

Re: Performance on faceting using docValues

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Lei,

The mailing list doesn't allow attachments. Can you share these via a file
sharing platform?

On Mon, Mar 9, 2015 at 12:48 AM, lei <si...@gmail.com> wrote:

> The Solr instance is single-shard. Index size is around 20G and total doc
> # is about 12 million. Below are the histograms for the three facet fields
> in my query. Thanks.
>
>
> On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
>> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>>
>> You present a very interesting observation. I have not noticed what you
>> describe, but on the other hand we have not done comparative speed
>> tests.
>>
>> > q=*:*&fq=country:"US"&fq=category:112
>>
>> First observation: Your query is '*:*, which is a "magic" query. Non-DV
>> faceting has optimizations both for this query (although that ought to
>> be disabled due to the fq) and for the "inverse" case where there are
>> more hits than non-hits. Perhaps you could test with a handful of
>> queries, which has different result sizes?
>>
>> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>>
>> The combination of index order and a high limit might be an explanation:
>> When resolving the Strings of the facet result, non-DV will perform
>> ordinal-lookup, which is fast when done in monotonic rising order
>> (sort=index) and if the values are close (limit=2000). I do not know if
>> DV benefits the same way.
>>
>> On the other hand, your limit seems to apply only to material, so it
>> could be that the real number of unique values is low and you just set
>> the limit to 2000 to be sure you get everything?
>>
>> > &facet.field=manufacturer&facet.field=seller&facet.field=material
>> >
>> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
>> >
>> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
>> > &f.material.facet.mincount=1&sort=score+desc
>>
>> How large is your index in bytes, how many documents does it contain and
>> is it single-shard or cloud? Could you paste the loglines containing
>> "UnInverted field", which describes the number of unique values and size
>> of your facet fields?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>


-- 
Anshum Gupta

Re: Performance on faceting using docValues

Posted by lei <si...@gmail.com>.
The Solr instance is single-shard. Index size is around 20G and total doc #
is about 12 million. Below are the histograms for the three facet fields in
my query. Thanks.


On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>
> You present a very interesting observation. I have not noticed what you
> describe, but on the other hand we have not done comparative speed
> tests.
>
> > q=*:*&fq=country:"US"&fq=category:112
>
> First observation: Your query is '*:*, which is a "magic" query. Non-DV
> faceting has optimizations both for this query (although that ought to
> be disabled due to the fq) and for the "inverse" case where there are
> more hits than non-hits. Perhaps you could test with a handful of
> queries, which has different result sizes?
>
> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>
> The combination of index order and a high limit might be an explanation:
> When resolving the Strings of the facet result, non-DV will perform
> ordinal-lookup, which is fast when done in monotonic rising order
> (sort=index) and if the values are close (limit=2000). I do not know if
> DV benefits the same way.
>
> On the other hand, your limit seems to apply only to material, so it
> could be that the real number of unique values is low and you just set
> the limit to 2000 to be sure you get everything?
>
> > &facet.field=manufacturer&facet.field=seller&facet.field=material
> >
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> >
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> > &f.material.facet.mincount=1&sort=score+desc
>
> How large is your index in bytes, how many documents does it contain and
> is it single-shard or cloud? Could you paste the loglines containing
> "UnInverted field", which describes the number of unique values and size
> of your facet fields?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>

Re: Performance on faceting using docValues

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2015-03-05 at 21:14 +0100, lei wrote:

You present a very interesting observation. I have not noticed what you
describe, but on the other hand we have not done comparative speed
tests.

> q=*:*&fq=country:"US"&fq=category:112

First observation: Your query is '*:*, which is a "magic" query. Non-DV
faceting has optimizations both for this query (although that ought to
be disabled due to the fq) and for the "inverse" case where there are
more hits than non-hits. Perhaps you could test with a handful of
queries, which has different result sizes?

> &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000

The combination of index order and a high limit might be an explanation:
When resolving the Strings of the facet result, non-DV will perform
ordinal-lookup, which is fast when done in monotonic rising order
(sort=index) and if the values are close (limit=2000). I do not know if
DV benefits the same way.

On the other hand, your limit seems to apply only to material, so it
could be that the real number of unique values is low and you just set
the limit to 2000 to be sure you get everything?

> &facet.field=manufacturer&facet.field=seller&facet.field=material
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> &f.material.facet.mincount=1&sort=score+desc

How large is your index in bytes, how many documents does it contain and
is it single-shard or cloud? Could you paste the loglines containing
"UnInverted field", which describes the number of unique values and size
of your facet fields?

- Toke Eskildsen, State and University Library, Denmark



Re: Performance on faceting using docValues

Posted by lei <si...@gmail.com>.
Some mistake in the previous email.

Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 100+ ms (with docValues) vs. 30+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000

The query looks like this:

q=*:*&fq=country:"US"&fq=category:112&facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000&facet.field=manufacturer&facet.field=seller&facet.field=material&f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100&f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100&f.material.facet.mincount=1&sort=score+desc

Thanks,

On Thu, Mar 5, 2015 at 11:42 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hello,
>
> I have one consideration on top of my head, would you mind to show a brief
> snapshot by a sampler?
>
> On Thu, Mar 5, 2015 at 10:18 PM, lei <si...@gmail.com> wrote:
>
> > Hi there,
> >
> > I'm testing facet performance with vs without docValues in Solr 4.7, and
> > found that on first request, performance with docValues is much faster
> than
> > non-docValues. However, for subsequent requests (where the queries are
> > cached), the performance is slower for docValues than non-docValues. Is
> > this an expected behavior? Any idea or solution is appreciated. Thanks.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>

Re: Performance on faceting using docValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

I have one consideration on top of my head, would you mind to show a brief
snapshot by a sampler?

On Thu, Mar 5, 2015 at 10:18 PM, lei <si...@gmail.com> wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, and
> found that on first request, performance with docValues is much faster than
> non-docValues. However, for subsequent requests (where the queries are
> cached), the performance is slower for docValues than non-docValues. Is
> this an expected behavior? Any idea or solution is appreciated. Thanks.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>