You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Mikhail Ibraheem <mi...@oracle.com> on 2017/04/30 07:48:44 UTC

JSON facet performance for aggregations

Hi,

I am trying to do aggregation with JSON faceting but performance is very bad for one of the requests:

json.facet={  

   studentId:{  

      type:terms,

      limit:-1,

      field:"studentId",

                  facet:{

                  x:"sum(grades)"

                  }

   }

}

 

This request finishes in 250 seconds, and we can't paginate for this service for functional reason so we have to use limit:-1, and the cardinality of the studentId is 7500.

 

If I try the same with flat facet it finishes in 3 seconds :  stats=true&facet=true&stats.field={!tag=piv1 sum=true}grades&facet.pivot={!stats=piv1}studentId

 

We are hoping to use one approach json or flat for all our services. JSON facet performance is better for many case.

 

Please advise on why the performance for this is so bad and if we can improve it. Also what is the default algorithm used for json facet.

 

Thanks

Mikhail

Re: JSON facet performance for aggregations

Posted by Saman Rasheed <sa...@hotmail.com>.

hi yonik,

i like your work on solr very much, and i'm hoping it can deliver what we are looking to acheive here... and apologies for the direct aproach but i dont i have a choice, i've sumitted the request below to the mailing list and i still havent had a reply ... and part of me wondering it's because either i have missed out on something very obvious, or maybe my aproach to my problem is using the wrong technology here!

The mailing list is not allowing me to send you a direct link to the issue unless you want to see my message with alot of xml 😊

so i'm pasting the contents of my message below:

thanks,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

i have an english book which i have indexed its contents successfully into field called 'content,
with the following properties:

<field name="content" type="text_general" indexed="true" stored="true" multiValued="true"
termVectors="true" termPositions="true" termOffsets="true"/>

so if need to return the number of a specific term regex e.g. '*olomo*' then my document should
contain 2 and give me 'Solomon' with a term frequency = 2.

I've tried going through the term vector section in the reference and various other posts
on the internet but still i havent managed to figure out how.

the nearest i found is the following syntax/way:

http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]&indent=true&tv.tf=true&tv.df=true

which brings my pc to a near halt for about a couple of minutes, and then it returns the term
frequency of every term! but i only need the term frequency of particular pattern/regex:

is there a way to narrow it down to just one regex term, e.g. *thing*, so it will find soothing,
somthing, everything each with their number of occurences for the document?

thanks,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________
From: Yonik Seeley <ys...@gmail.com>
Sent: 24 May 2017 10:45
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley <ys...@gmail.com> wrote:
> I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
> this performance issue.

OK, this has been committed.
A quick test shows about a 30x speedup when faceting on a
string/numeric docvalues field with 100K unique values and doing a
simple aggregation on another numeric field (and when the limit:-1).

-Yonik

Re: JSON facet performance for aggregations

Posted by Yonik Seeley <ys...@gmail.com>.

On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley <ys...@gmail.com> wrote:
> I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
> this performance issue.

OK, this has been committed.
A quick test shows about a 30x speedup when faceting on a
string/numeric docvalues field with 100K unique values and doing a
simple aggregation on another numeric field (and when the limit:-1).

-Yonik

Re: JSON facet performance for aggregations

Posted by Yonik Seeley <ys...@gmail.com>.

On Mon, May 8, 2017 at 3:55 AM, Mikhail Ibraheem
<mi...@oracle.com> wrote:
> Thanks Yonik.
> It is double because our use case allows to group by any field of any type.

Grouping in Solr does not require a double type, so I'm not sure how
that logically follows.  Perhaps it's a limitation in the system using
Solr?

> According to your below valuable explanation, is it better at this case to use flat faceting instead of JSON faceting?

I don't think it would help.

I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
this performance issue.

> Indexing the field should give us better performance than flat faceting?

Indexing the studentId field should give better performance wherever
you need to search for or filter by specific student ids.

-Yonik


> Indexing the field should give us better performance than flat faceting?
> Do you recommend streaming at that case?
>
> Please advise.
>
> Thanks
> Mikhail
>
> -----Original Message-----
> From: Yonik Seeley [mailto:yseeley@gmail.com]
> Sent: Sunday, May 07, 2017 6:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> OK, so I think I know what's going on.
>
> The current code is more optimized for finding the top K buckets from a total of N.
> When one asks to return the top 10 buckets when there are potentially millions of buckets, it makes sense to defer calculating other metrics for those buckets until we know which ones they are.  After we identify the top 10 buckets, we calculate the domain for that bucket and use that to calculate the remaining metrics.
>
> The current method is obviously much slower when one is requesting
> *all* buckets.  We might as well just calculate all metrics in the first pass rather than trying to defer them.
>
> This inefficiency is compounded by the fact that the fields are not indexed.  In the second phase, finding the domain for a bucket is a field query.  For an indexed field, this would involve a single term lookup.  For a non-indexed docValues field, this involves a full column scan.
>
> If you ever want to do quick lookups on studentId, it would make sense for it to be indexed (and why is it a double, anyway?)
>
> I'll open up a JIRA issue for the first problem (don't defer metrics if we're going to return all buckets anyway)
>
> -Yonik
>
>
> On Sun, Apr 30, 2017 at 8:58 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
>> Hi Yonik,
>> We are using Solr 6.5
>> Both studentId and grades are double:
>>   <fieldType name="double" class="solr.TrieDoubleField"
>> indexed="false" stored="true" docValues="true" multiValued="false"
>> required="false"/>
>>
>> We have 1.5 million records.
>>
>> Thanks
>> Mikhail
>>
>> -----Original Message-----
>> From: Yonik Seeley [mailto:yseeley@gmail.com]
>> Sent: Sunday, April 30, 2017 1:04 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> It is odd there would be quite such a big performance delta.
>> What version of solr are you using?
>> What is the fieldType of "grades"?
>> -Yonik
>>
>>
>> On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
>>> 1-
>>> studentId has docValue = true . it is of type double which is
>>> <fieldType name="double" class="solr.TrieDoubleField" indexed="false"
>>> stored="true" docValues="true" multiValued="false" required="false"/>
>>>
>>>
>>> 2- If we just facet without aggregation it finishes in good time 60ms:
>>>
>>> json.facet={
>>>    studentId:{
>>>       type:terms,
>>>       limit:-1,
>>>       field:" studentId "
>>>
>>>    }
>>> }
>>>
>>>
>>> Thanks
>>>
>>>
>>> -----Original Message-----
>>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>>> Sent: Sunday, April 30, 2017 10:44 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: JSON facet performance for aggregations
>>>
>>> Please enable doc values and try.
>>> There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.
>>>
>>> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem"
>>> <mi...@oracle.com>
>>> wrote:
>>>
>>>> Hi Vijay,
>>>> It is already numeric field.
>>>> It is huge difference between json and flat here. Do you know the
>>>> reason for this? Is there a way to improve it ?
>>>>
>>>> -----Original Message-----
>>>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>>>> Sent: Sunday, April 30, 2017 9:58 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: JSON facet performance for aggregations
>>>>
>>>> Json facet on string fields run lot slower than on numeric fields.
>>>> Try and see if you can represent studentid as a numeric field.
>>>>
>>>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>>>> <mi...@oracle.com>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I am trying to do aggregation with JSON faceting but performance
>>>> > is very bad for one of the requests:
>>>> >
>>>> > json.facet={
>>>> >
>>>> >    studentId:{
>>>> >
>>>> >       type:terms,
>>>> >
>>>> >       limit:-1,
>>>> >
>>>> >       field:"studentId",
>>>> >
>>>> >                   facet:{
>>>> >
>>>> >                   x:"sum(grades)"
>>>> >
>>>> >                   }
>>>> >
>>>> >    }
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> >
>>>> > This request finishes in 250 seconds, and we can't paginate for
>>>> > this service for functional reason so we have to use limit:-1, and
>>>> > the cardinality of the studentId is 7500.
>>>> >
>>>> >
>>>> >
>>>> > If I try the same with flat facet it finishes in 3 seconds :
>>>> > stats=true&facet=true&stats.field={!tag=piv1
>>>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>>>> >
>>>> >
>>>> >
>>>> > We are hoping to use one approach json or flat for all our services.
>>>> > JSON facet performance is better for many case.
>>>> >
>>>> >
>>>> >
>>>> > Please advise on why the performance for this is so bad and if we
>>>> > can improve it. Also what is the default algorithm used for json facet.
>>>> >
>>>> >
>>>> >
>>>> > Thanks
>>>> >
>>>> > Mikhail
>>>> >
>>>>

RE: JSON facet performance for aggregations

Posted by Mikhail Ibraheem <mi...@oracle.com>.

Thanks Yonik.
It is double because our use case allows to group by any field of any type.
According to your below valuable explanation, is it better at this case to use flat faceting instead of JSON faceting?
Indexing the field should give us better performance than flat faceting?
Do you recommend streaming at that case?

Please advise.

Thanks
Mikhail

-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com] 
Sent: Sunday, May 07, 2017 6:25 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

OK, so I think I know what's going on.

The current code is more optimized for finding the top K buckets from a total of N.
When one asks to return the top 10 buckets when there are potentially millions of buckets, it makes sense to defer calculating other metrics for those buckets until we know which ones they are.  After we identify the top 10 buckets, we calculate the domain for that bucket and use that to calculate the remaining metrics.

The current method is obviously much slower when one is requesting
*all* buckets.  We might as well just calculate all metrics in the first pass rather than trying to defer them.

This inefficiency is compounded by the fact that the fields are not indexed.  In the second phase, finding the domain for a bucket is a field query.  For an indexed field, this would involve a single term lookup.  For a non-indexed docValues field, this involves a full column scan.

If you ever want to do quick lookups on studentId, it would make sense for it to be indexed (and why is it a double, anyway?)

I'll open up a JIRA issue for the first problem (don't defer metrics if we're going to return all buckets anyway)

-Yonik


On Sun, Apr 30, 2017 at 8:58 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
> Hi Yonik,
> We are using Solr 6.5
> Both studentId and grades are double:
>   <fieldType name="double" class="solr.TrieDoubleField" 
> indexed="false" stored="true" docValues="true" multiValued="false" 
> required="false"/>
>
> We have 1.5 million records.
>
> Thanks
> Mikhail
>
> -----Original Message-----
> From: Yonik Seeley [mailto:yseeley@gmail.com]
> Sent: Sunday, April 30, 2017 1:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> It is odd there would be quite such a big performance delta.
> What version of solr are you using?
> What is the fieldType of "grades"?
> -Yonik
>
>
> On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
>> 1-
>> studentId has docValue = true . it is of type double which is 
>> <fieldType name="double" class="solr.TrieDoubleField" indexed="false"
>> stored="true" docValues="true" multiValued="false" required="false"/>
>>
>>
>> 2- If we just facet without aggregation it finishes in good time 60ms:
>>
>> json.facet={
>>    studentId:{
>>       type:terms,
>>       limit:-1,
>>       field:" studentId "
>>
>>    }
>> }
>>
>>
>> Thanks
>>
>>
>> -----Original Message-----
>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>> Sent: Sunday, April 30, 2017 10:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: JSON facet performance for aggregations
>>
>> Please enable doc values and try.
>> There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.
>>
>> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem"
>> <mi...@oracle.com>
>> wrote:
>>
>>> Hi Vijay,
>>> It is already numeric field.
>>> It is huge difference between json and flat here. Do you know the 
>>> reason for this? Is there a way to improve it ?
>>>
>>> -----Original Message-----
>>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>>> Sent: Sunday, April 30, 2017 9:58 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: JSON facet performance for aggregations
>>>
>>> Json facet on string fields run lot slower than on numeric fields.
>>> Try and see if you can represent studentid as a numeric field.
>>>
>>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>>> <mi...@oracle.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I am trying to do aggregation with JSON faceting but performance 
>>> > is very bad for one of the requests:
>>> >
>>> > json.facet={
>>> >
>>> >    studentId:{
>>> >
>>> >       type:terms,
>>> >
>>> >       limit:-1,
>>> >
>>> >       field:"studentId",
>>> >
>>> >                   facet:{
>>> >
>>> >                   x:"sum(grades)"
>>> >
>>> >                   }
>>> >
>>> >    }
>>> >
>>> > }
>>> >
>>> >
>>> >
>>> > This request finishes in 250 seconds, and we can't paginate for 
>>> > this service for functional reason so we have to use limit:-1, and 
>>> > the cardinality of the studentId is 7500.
>>> >
>>> >
>>> >
>>> > If I try the same with flat facet it finishes in 3 seconds :
>>> > stats=true&facet=true&stats.field={!tag=piv1
>>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>>> >
>>> >
>>> >
>>> > We are hoping to use one approach json or flat for all our services.
>>> > JSON facet performance is better for many case.
>>> >
>>> >
>>> >
>>> > Please advise on why the performance for this is so bad and if we 
>>> > can improve it. Also what is the default algorithm used for json facet.
>>> >
>>> >
>>> >
>>> > Thanks
>>> >
>>> > Mikhail
>>> >
>>>

Re: JSON facet performance for aggregations

Posted by Yonik Seeley <ys...@gmail.com>.

OK, so I think I know what's going on.

The current code is more optimized for finding the top K buckets from
a total of N.
When one asks to return the top 10 buckets when there are potentially
millions of buckets, it makes sense to defer calculating other metrics
for those buckets until we know which ones they are.  After we
identify the top 10 buckets, we calculate the domain for that bucket
and use that to calculate the remaining metrics.

The current method is obviously much slower when one is requesting
*all* buckets.  We might as well just calculate all metrics in the
first pass rather than trying to defer them.

This inefficiency is compounded by the fact that the fields are not
indexed.  In the second phase, finding the domain for a bucket is a
field query.  For an indexed field, this would involve a single term
lookup.  For a non-indexed docValues field, this involves a full
column scan.

If you ever want to do quick lookups on studentId, it would make sense
for it to be indexed (and why is it a double, anyway?)

I'll open up a JIRA issue for the first problem (don't defer metrics
if we're going to return all buckets anyway)

-Yonik


On Sun, Apr 30, 2017 at 8:58 AM, Mikhail Ibraheem
<mi...@oracle.com> wrote:
> Hi Yonik,
> We are using Solr 6.5
> Both studentId and grades are double:
>   <fieldType name="double" class="solr.TrieDoubleField" indexed="false" stored="true" docValues="true" multiValued="false" required="false"/>
>
> We have 1.5 million records.
>
> Thanks
> Mikhail
>
> -----Original Message-----
> From: Yonik Seeley [mailto:yseeley@gmail.com]
> Sent: Sunday, April 30, 2017 1:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> It is odd there would be quite such a big performance delta.
> What version of solr are you using?
> What is the fieldType of "grades"?
> -Yonik
>
>
> On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
>> 1-
>> studentId has docValue = true . it is of type double which is
>> <fieldType name="double" class="solr.TrieDoubleField" indexed="false"
>> stored="true" docValues="true" multiValued="false" required="false"/>
>>
>>
>> 2- If we just facet without aggregation it finishes in good time 60ms:
>>
>> json.facet={
>>    studentId:{
>>       type:terms,
>>       limit:-1,
>>       field:" studentId "
>>
>>    }
>> }
>>
>>
>> Thanks
>>
>>
>> -----Original Message-----
>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>> Sent: Sunday, April 30, 2017 10:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: JSON facet performance for aggregations
>>
>> Please enable doc values and try.
>> There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.
>>
>> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem"
>> <mi...@oracle.com>
>> wrote:
>>
>>> Hi Vijay,
>>> It is already numeric field.
>>> It is huge difference between json and flat here. Do you know the
>>> reason for this? Is there a way to improve it ?
>>>
>>> -----Original Message-----
>>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>>> Sent: Sunday, April 30, 2017 9:58 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: JSON facet performance for aggregations
>>>
>>> Json facet on string fields run lot slower than on numeric fields.
>>> Try and see if you can represent studentid as a numeric field.
>>>
>>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>>> <mi...@oracle.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I am trying to do aggregation with JSON faceting but performance is
>>> > very bad for one of the requests:
>>> >
>>> > json.facet={
>>> >
>>> >    studentId:{
>>> >
>>> >       type:terms,
>>> >
>>> >       limit:-1,
>>> >
>>> >       field:"studentId",
>>> >
>>> >                   facet:{
>>> >
>>> >                   x:"sum(grades)"
>>> >
>>> >                   }
>>> >
>>> >    }
>>> >
>>> > }
>>> >
>>> >
>>> >
>>> > This request finishes in 250 seconds, and we can't paginate for
>>> > this service for functional reason so we have to use limit:-1, and
>>> > the cardinality of the studentId is 7500.
>>> >
>>> >
>>> >
>>> > If I try the same with flat facet it finishes in 3 seconds :
>>> > stats=true&facet=true&stats.field={!tag=piv1
>>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>>> >
>>> >
>>> >
>>> > We are hoping to use one approach json or flat for all our services.
>>> > JSON facet performance is better for many case.
>>> >
>>> >
>>> >
>>> > Please advise on why the performance for this is so bad and if we
>>> > can improve it. Also what is the default algorithm used for json facet.
>>> >
>>> >
>>> >
>>> > Thanks
>>> >
>>> > Mikhail
>>> >
>>>

RE: JSON facet performance for aggregations

Posted by Mikhail Ibraheem <mi...@oracle.com>.

Hi Yonik,
We are using Solr 6.5
Both studentId and grades are double:
  <fieldType name="double" class="solr.TrieDoubleField" indexed="false" stored="true" docValues="true" multiValued="false" required="false"/>

We have 1.5 million records.

Thanks
Mikhail

-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com] 
Sent: Sunday, April 30, 2017 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem <mi...@oracle.com> wrote:
> 1-
> studentId has docValue = true . it is of type double which is 
> <fieldType name="double" class="solr.TrieDoubleField" indexed="false" 
> stored="true" docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>    studentId:{
>       type:terms,
>       limit:-1,
>       field:" studentId "
>
>    }
> }
>
>
> Thanks
>
>
> -----Original Message-----
> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
> <mi...@oracle.com>
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the 
>> reason for this? Is there a way to improve it ?
>>
>> -----Original Message-----
>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. 
>> Try and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> <mi...@oracle.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is 
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >    studentId:{
>> >
>> >       type:terms,
>> >
>> >       limit:-1,
>> >
>> >       field:"studentId",
>> >
>> >                   facet:{
>> >
>> >                   x:"sum(grades)"
>> >
>> >                   }
>> >
>> >    }
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for 
>> > this service for functional reason so we have to use limit:-1, and 
>> > the cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true&facet=true&stats.field={!tag=piv1
>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we 
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>

Re: JSON facet performance for aggregations

Posted by Yonik Seeley <ys...@gmail.com>.

It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem
<mi...@oracle.com> wrote:
> 1-
> studentId has docValue = true . it is of type double which is <fieldType name="double" class="solr.TrieDoubleField" indexed="false" stored="true" docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>    studentId:{
>       type:terms,
>       limit:-1,
>       field:" studentId "
>
>    }
> }
>
>
> Thanks
>
>
> -----Original Message-----
> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mi...@oracle.com>
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the
>> reason for this? Is there a way to improve it ?
>>
>> -----Original Message-----
>> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. Try
>> and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> <mi...@oracle.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >    studentId:{
>> >
>> >       type:terms,
>> >
>> >       limit:-1,
>> >
>> >       field:"studentId",
>> >
>> >                   facet:{
>> >
>> >                   x:"sum(grades)"
>> >
>> >                   }
>> >
>> >    }
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for this
>> > service for functional reason so we have to use limit:-1, and the
>> > cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true&facet=true&stats.field={!tag=piv1
>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>

RE: JSON facet performance for aggregations

Posted by Mikhail Ibraheem <mi...@oracle.com>.

1- 
studentId has docValue = true . it is of type double which is <fieldType name="double" class="solr.TrieDoubleField" indexed="false" stored="true" docValues="true" multiValued="false" required="false"/>


2- If we just facet without aggregation it finishes in good time 60ms:

json.facet={  
   studentId:{  
      type:terms,
      limit:-1,
      field:" studentId "
	  
   }
}


Thanks


-----Original Message-----
From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com] 
Sent: Sunday, April 30, 2017 10:44 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facet performance for aggregations

Please enable doc values and try.
There is a bug in the source code which causes json facet on string field to run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mi...@oracle.com>
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the 
> reason for this? Is there a way to improve it ?
>
> -----Original Message-----
> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try 
> and see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
> <mi...@oracle.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is 
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >    studentId:{
> >
> >       type:terms,
> >
> >       limit:-1,
> >
> >       field:"studentId",
> >
> >                   facet:{
> >
> >                   x:"sum(grades)"
> >
> >                   }
> >
> >    }
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this 
> > service for functional reason so we have to use limit:-1, and the 
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true&facet=true&stats.field={!tag=piv1
> > sum=true}grades&facet.pivot={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we 
> > can improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>

RE: JSON facet performance for aggregations

Posted by Vijay Tiwary <vi...@gmail.com>.

Please enable doc values and try.
There is a bug in the source code which causes json facet on string field
to run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mi...@oracle.com>
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the reason
> for this? Is there a way to improve it ?
>
> -----Original Message-----
> From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try and
> see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mi...@oracle.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >    studentId:{
> >
> >       type:terms,
> >
> >       limit:-1,
> >
> >       field:"studentId",
> >
> >                   facet:{
> >
> >                   x:"sum(grades)"
> >
> >                   }
> >
> >    }
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this
> > service for functional reason so we have to use limit:-1, and the
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true&facet=true&stats.field={!tag=piv1
> > sum=true}grades&facet.pivot={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we can
> > improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>

RE: JSON facet performance for aggregations

Posted by Mikhail Ibraheem <mi...@oracle.com>.

Hi Vijay,
It is already numeric field.
It is huge difference between json and flat here. Do you know the reason for this? Is there a way to improve it ?

-----Original Message-----
From: Vijay Tiwary [mailto:vijaykr.tiwary@gmail.com] 
Sent: Sunday, April 30, 2017 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

Json facet on string fields run lot slower than on numeric fields. Try and see if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mi...@oracle.com>
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is 
> very bad for one of the requests:
>
> json.facet={
>
>    studentId:{
>
>       type:terms,
>
>       limit:-1,
>
>       field:"studentId",
>
>                   facet:{
>
>                   x:"sum(grades)"
>
>                   }
>
>    }
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this 
> service for functional reason so we have to use limit:-1, and the 
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true&facet=true&stats.field={!tag=piv1
> sum=true}grades&facet.pivot={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. 
> JSON facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can 
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>

Re: JSON facet performance for aggregations

Posted by Vijay Tiwary <vi...@gmail.com>.

Json facet on string fields run lot slower than on numeric fields. Try and
see if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mi...@oracle.com>
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is very
> bad for one of the requests:
>
> json.facet={
>
>    studentId:{
>
>       type:terms,
>
>       limit:-1,
>
>       field:"studentId",
>
>                   facet:{
>
>                   x:"sum(grades)"
>
>                   }
>
>    }
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this
> service for functional reason so we have to use limit:-1, and the
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true&facet=true&stats.field={!tag=piv1
> sum=true}grades&facet.pivot={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. JSON
> facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>