You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by RAUNAK AGRAWAL <ag...@gmail.com> on 2017/12/12 00:32:24 UTC

Solr Aggregation queries are way slower than Elastic Search

Hi,

We have a use case where there are 4-5 dimensions and around 3500 metrics
in a single document. We have indexed only 2 dimensions and made all the
metrics as doc_values so that we can run the aggregation queries.

We have 6 million such documents and we are using solr cloud(6.6) on a 6
node cluster with 8 Vcores and 24 GB RAM each.

On the same set of hardware in elastic search we were getting the response
in about 10ms whereas in solr we are getting response in around 300-400 ms.

This is how I am querying the data.

private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
List<String> metrics) {
    SolrQuery query = new SolrQuery();
    String groupQuery = " (" + groups.stream().map(g -> "group:" + g).collect
            (Collectors.joining(" OR ")) + ")";
    String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
    query.set("q", finalQuery);
    query.setRows(0);
    metrics.forEach(
            metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
    );
    return query;
}

Any help will be appreciated regarding this.


Thanks,

Raunak

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by RAUNAK AGRAWAL <ag...@gmail.com>.
Thanks Yonik and Joel. I will try with JSON Facet API and update the
results here.

On Tue, Dec 12, 2017 at 10:56 PM, Yonik Seeley <ys...@gmail.com> wrote:

> On Tue, Dec 12, 2017 at 9:17 AM, RAUNAK AGRAWAL
> <ag...@gmail.com> wrote:
> > Hi Yonik,
> >
> > So if the query is fine then I guess even using JSON Facet API will not
> > help me here.
>
> As Joel mentioned, it's completely different code than the old stats API.
> This is a very simple use-case, so if we're slower than ES for some
> reason, it should be very easy to fix.
>
> -Yonik
>
>
> > On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ys...@gmail.com> wrote:
> >
> >> OK great, so it's definitely not the main query (which is just a
> >> single term query in this example!)
> >>
> >> > Also I have looked into the JSON Facet API. If I have to use facets, I
> >> will
> >> > have to then define 3600 facets in a single query and I guess that
> would
> >> be
> >> > also slow.
> >>
> >> You can ask for any number of stats for a given facet (even the root
> >> facet bucket w/o faceting on any fields):
> >>
> >> cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
> >> 290&rows=0&json.facet={
> >>   s1:"sum(metric_1)",
> >>   s2:"sum(metric_2)",
> >>   s3:"sum(metric_3)"
> >> }'
> >>
> >> -Yonik
> >>
> >>
> >> On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
> >> <ag...@gmail.com> wrote:
> >> > Hi Yonik,
> >> >
> >> > As you asked here is the code snippet and the actual solr query.
> Please
> >> > have a look. I have included only 104 metrics but like this we can go
> >> upto
> >> > 3600 rollups.
> >> >
> >> > Also I have looked into the JSON Facet API. If I have to use facets, I
> >> will
> >> > have to then define 3600 facets in a single query and I guess that
> would
> >> be
> >> > also slow. Also is there any max limit on the number of facets we can
> >> > define in a single query?
> >> >
> >> > Code snippet:
> >> >
> >> > private SolrQuery buildQuery(Integer variable1, List<String> metrics)
> {
> >> >     SolrQuery query = new SolrQuery();
> >> >     query.set("q", "variable1:" + variable1);
> >> >     query.setRows(0);
> >> >     metrics.forEach(
> >> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> >> metric)
> >> >     );
> >> >     return query;
> >> > }
> >> >
> >> >
> >> > The generated query:
> >> >
> >> > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> >> > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> >> > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> >> > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> >> > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> >> > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> >> > }metric_11' stats.field='{!sum=true }metric_12'
> >> > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> >> > }metric_14' stats.field='{!sum=true }metric_15'
> >> > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> >> > }metric_17' stats.field='{!sum=true }metric_18'
> >> > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> >> > }metric_20' stats.field='{!sum=true }metric_21'
> >> > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> >> > }metric_23' stats.field='{!sum=true }metric_24'
> >> > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> >> > }metric_26' stats.field='{!sum=true }metric_27'
> >> > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> >> > }metric_29' stats.field='{!sum=true }metric_30'
> >> > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> >> > }metric_32' stats.field='{!sum=true }metric_33'
> >> > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> >> > }metric_35' stats.field='{!sum=true }metric_36'
> >> > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> >> > }metric_38' stats.field='{!sum=true }metric_39'
> >> > stats.field='{!sum=true }metric_40' stats.field='{!sum=true
> >> > }metric_41' stats.field='{!sum=true }metric_42'
> >> > stats.field='{!sum=true }metric_43' stats.field='{!sum=true
> >> > }metric_44' stats.field='{!sum=true }metric_45'
> >> > stats.field='{!sum=true }metric_46' stats.field='{!sum=true
> >> > }metric_47' stats.field='{!sum=true }metric_48'
> >> > stats.field='{!sum=true }metric_49' stats.field='{!sum=true
> >> > }metric_50' stats.field='{!sum=true }metric_51'
> >> > stats.field='{!sum=true }metric_52' stats.field='{!sum=true
> >> > }metric_53' stats.field='{!sum=true }metric_54'
> >> > stats.field='{!sum=true }metric_55' stats.field='{!sum=true
> >> > }metric_56' stats.field='{!sum=true }metric_57'
> >> > stats.field='{!sum=true }metric_58' stats.field='{!sum=true
> >> > }metric_59' stats.field='{!sum=true }metric_60'
> >> > stats.field='{!sum=true }metric_61' stats.field='{!sum=true
> >> > }metric_62' stats.field='{!sum=true }metric_63'
> >> > stats.field='{!sum=true }metric_64' stats.field='{!sum=true
> >> > }metric_65' stats.field='{!sum=true }metric_66'
> >> > stats.field='{!sum=true }metric_67' stats.field='{!sum=true
> >> > }metric_68' stats.field='{!sum=true }metric_69'
> >> > stats.field='{!sum=true }metric_70' stats.field='{!sum=true
> >> > }metric_71' stats.field='{!sum=true }metric_72'
> >> > stats.field='{!sum=true }metric_73' stats.field='{!sum=true
> >> > }metric_74' stats.field='{!sum=true }metric_75'
> >> > stats.field='{!sum=true }metric_76' stats.field='{!sum=true
> >> > }metric_77' stats.field='{!sum=true }metric_78'
> >> > stats.field='{!sum=true }metric_79' stats.field='{!sum=true
> >> > }metric_80' stats.field='{!sum=true }metric_81'
> >> > stats.field='{!sum=true }metric_82' stats.field='{!sum=true
> >> > }metric_83' stats.field='{!sum=true }metric_84'
> >> > stats.field='{!sum=true }metric_85' stats.field='{!sum=true
> >> > }metric_86' stats.field='{!sum=true }metric_87'
> >> > stats.field='{!sum=true }metric_88' stats.field='{!sum=true
> >> > }metric_89' stats.field='{!sum=true }metric_90'
> >> > stats.field='{!sum=true }metric_91' stats.field='{!sum=true
> >> > }metric_92' stats.field='{!sum=true }metric_93'
> >> > stats.field='{!sum=true }metric_94' stats.field='{!sum=true
> >> > }metric_95' stats.field='{!sum=true }metric_96'
> >> > stats.field='{!sum=true }metric_97' stats.field='{!sum=true
> >> > }metric_98' stats.field='{!sum=true }metric_99'
> >> > stats.field='{!sum=true }metric_100' stats.field='{!sum=true
> >> > }metric_101' stats.field='{!sum=true }metric_102'
> >> > stats.field='{!sum=true }metric_103' stats.field='{!sum=true
> >> > }metric_104'}
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <
> >> agrawal.raunak@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Yonik,
> >> >>
> >> >> I will try the JSON Facet API and update here but my hunch is that
> >> >> querying mechanism is not the problem. Rather the problem lies with
> the
> >> >> solr aggregations.
> >> >>
> >> >> Thanks
> >> >>
> >> >> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com>
> >> wrote:
> >> >>
> >> >>> I think the SolrJ below uses the old stats component.
> >> >>> Hopefully the JSON Facet API would be faster for this, but it's not
> >> >>> completely clear what the main query here looks like, and if it's
> the
> >> >>> source of any bottleneck rather than the aggregations.
> >> >>> What does the generated query string actually look like (it may be
> >> >>> easiest just to pull this from the logs).
> >> >>>
> >> >>> -Yonik
> >> >>>
> >> >>>
> >> >>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
> >> >>> <ag...@gmail.com> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > We have a use case where there are 4-5 dimensions and around 3500
> >> >>> metrics
> >> >>> > in a single document. We have indexed only 2 dimensions and made
> all
> >> the
> >> >>> > metrics as doc_values so that we can run the aggregation queries.
> >> >>> >
> >> >>> > We have 6 million such documents and we are using solr cloud(6.6)
> on
> >> a 6
> >> >>> > node cluster with 8 Vcores and 24 GB RAM each.
> >> >>> >
> >> >>> > On the same set of hardware in elastic search we were getting the
> >> >>> response
> >> >>> > in about 10ms whereas in solr we are getting response in around
> >> 300-400
> >> >>> ms.
> >> >>> >
> >> >>> > This is how I am querying the data.
> >> >>> >
> >> >>> > private SolrQuery buildQuery(Integer variable1, List<Integer>
> groups,
> >> >>> > List<String> metrics) {
> >> >>> >     SolrQuery query = new SolrQuery();
> >> >>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
> >> >>> g).collect
> >> >>> >             (Collectors.joining(" OR ")) + ")";
> >> >>> >     String finalQuery = "variable1:" + variable1 + " AND " +
> >> groupQuery;
> >> >>> >     query.set("q", finalQuery);
> >> >>> >     query.setRows(0);
> >> >>> >     metrics.forEach(
> >> >>> >             metric -> query.setGetFieldStatistics("{!sum=true }"
> +
> >> >>> metric)
> >> >>> >     );
> >> >>> >     return query;
> >> >>> > }
> >> >>> >
> >> >>> > Any help will be appreciated regarding this.
> >> >>> >
> >> >>> >
> >> >>> > Thanks,
> >> >>> >
> >> >>> > Raunak
> >> >>>
> >> >>
> >> >>
> >>
>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Dec 12, 2017 at 9:17 AM, RAUNAK AGRAWAL
<ag...@gmail.com> wrote:
> Hi Yonik,
>
> So if the query is fine then I guess even using JSON Facet API will not
> help me here.

As Joel mentioned, it's completely different code than the old stats API.
This is a very simple use-case, so if we're slower than ES for some
reason, it should be very easy to fix.

-Yonik


> On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
>> OK great, so it's definitely not the main query (which is just a
>> single term query in this example!)
>>
>> > Also I have looked into the JSON Facet API. If I have to use facets, I
>> will
>> > have to then define 3600 facets in a single query and I guess that would
>> be
>> > also slow.
>>
>> You can ask for any number of stats for a given facet (even the root
>> facet bucket w/o faceting on any fields):
>>
>> cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
>> 290&rows=0&json.facet={
>>   s1:"sum(metric_1)",
>>   s2:"sum(metric_2)",
>>   s3:"sum(metric_3)"
>> }'
>>
>> -Yonik
>>
>>
>> On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
>> <ag...@gmail.com> wrote:
>> > Hi Yonik,
>> >
>> > As you asked here is the code snippet and the actual solr query. Please
>> > have a look. I have included only 104 metrics but like this we can go
>> upto
>> > 3600 rollups.
>> >
>> > Also I have looked into the JSON Facet API. If I have to use facets, I
>> will
>> > have to then define 3600 facets in a single query and I guess that would
>> be
>> > also slow. Also is there any max limit on the number of facets we can
>> > define in a single query?
>> >
>> > Code snippet:
>> >
>> > private SolrQuery buildQuery(Integer variable1, List<String> metrics) {
>> >     SolrQuery query = new SolrQuery();
>> >     query.set("q", "variable1:" + variable1);
>> >     query.setRows(0);
>> >     metrics.forEach(
>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
>> metric)
>> >     );
>> >     return query;
>> > }
>> >
>> >
>> > The generated query:
>> >
>> > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
>> > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
>> > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
>> > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
>> > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
>> > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
>> > }metric_11' stats.field='{!sum=true }metric_12'
>> > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
>> > }metric_14' stats.field='{!sum=true }metric_15'
>> > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
>> > }metric_17' stats.field='{!sum=true }metric_18'
>> > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
>> > }metric_20' stats.field='{!sum=true }metric_21'
>> > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
>> > }metric_23' stats.field='{!sum=true }metric_24'
>> > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
>> > }metric_26' stats.field='{!sum=true }metric_27'
>> > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
>> > }metric_29' stats.field='{!sum=true }metric_30'
>> > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
>> > }metric_32' stats.field='{!sum=true }metric_33'
>> > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
>> > }metric_35' stats.field='{!sum=true }metric_36'
>> > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
>> > }metric_38' stats.field='{!sum=true }metric_39'
>> > stats.field='{!sum=true }metric_40' stats.field='{!sum=true
>> > }metric_41' stats.field='{!sum=true }metric_42'
>> > stats.field='{!sum=true }metric_43' stats.field='{!sum=true
>> > }metric_44' stats.field='{!sum=true }metric_45'
>> > stats.field='{!sum=true }metric_46' stats.field='{!sum=true
>> > }metric_47' stats.field='{!sum=true }metric_48'
>> > stats.field='{!sum=true }metric_49' stats.field='{!sum=true
>> > }metric_50' stats.field='{!sum=true }metric_51'
>> > stats.field='{!sum=true }metric_52' stats.field='{!sum=true
>> > }metric_53' stats.field='{!sum=true }metric_54'
>> > stats.field='{!sum=true }metric_55' stats.field='{!sum=true
>> > }metric_56' stats.field='{!sum=true }metric_57'
>> > stats.field='{!sum=true }metric_58' stats.field='{!sum=true
>> > }metric_59' stats.field='{!sum=true }metric_60'
>> > stats.field='{!sum=true }metric_61' stats.field='{!sum=true
>> > }metric_62' stats.field='{!sum=true }metric_63'
>> > stats.field='{!sum=true }metric_64' stats.field='{!sum=true
>> > }metric_65' stats.field='{!sum=true }metric_66'
>> > stats.field='{!sum=true }metric_67' stats.field='{!sum=true
>> > }metric_68' stats.field='{!sum=true }metric_69'
>> > stats.field='{!sum=true }metric_70' stats.field='{!sum=true
>> > }metric_71' stats.field='{!sum=true }metric_72'
>> > stats.field='{!sum=true }metric_73' stats.field='{!sum=true
>> > }metric_74' stats.field='{!sum=true }metric_75'
>> > stats.field='{!sum=true }metric_76' stats.field='{!sum=true
>> > }metric_77' stats.field='{!sum=true }metric_78'
>> > stats.field='{!sum=true }metric_79' stats.field='{!sum=true
>> > }metric_80' stats.field='{!sum=true }metric_81'
>> > stats.field='{!sum=true }metric_82' stats.field='{!sum=true
>> > }metric_83' stats.field='{!sum=true }metric_84'
>> > stats.field='{!sum=true }metric_85' stats.field='{!sum=true
>> > }metric_86' stats.field='{!sum=true }metric_87'
>> > stats.field='{!sum=true }metric_88' stats.field='{!sum=true
>> > }metric_89' stats.field='{!sum=true }metric_90'
>> > stats.field='{!sum=true }metric_91' stats.field='{!sum=true
>> > }metric_92' stats.field='{!sum=true }metric_93'
>> > stats.field='{!sum=true }metric_94' stats.field='{!sum=true
>> > }metric_95' stats.field='{!sum=true }metric_96'
>> > stats.field='{!sum=true }metric_97' stats.field='{!sum=true
>> > }metric_98' stats.field='{!sum=true }metric_99'
>> > stats.field='{!sum=true }metric_100' stats.field='{!sum=true
>> > }metric_101' stats.field='{!sum=true }metric_102'
>> > stats.field='{!sum=true }metric_103' stats.field='{!sum=true
>> > }metric_104'}
>> >
>> >
>> >
>> >
>> > On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <
>> agrawal.raunak@gmail.com>
>> > wrote:
>> >
>> >> Hi Yonik,
>> >>
>> >> I will try the JSON Facet API and update here but my hunch is that
>> >> querying mechanism is not the problem. Rather the problem lies with the
>> >> solr aggregations.
>> >>
>> >> Thanks
>> >>
>> >> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com>
>> wrote:
>> >>
>> >>> I think the SolrJ below uses the old stats component.
>> >>> Hopefully the JSON Facet API would be faster for this, but it's not
>> >>> completely clear what the main query here looks like, and if it's the
>> >>> source of any bottleneck rather than the aggregations.
>> >>> What does the generated query string actually look like (it may be
>> >>> easiest just to pull this from the logs).
>> >>>
>> >>> -Yonik
>> >>>
>> >>>
>> >>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
>> >>> <ag...@gmail.com> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > We have a use case where there are 4-5 dimensions and around 3500
>> >>> metrics
>> >>> > in a single document. We have indexed only 2 dimensions and made all
>> the
>> >>> > metrics as doc_values so that we can run the aggregation queries.
>> >>> >
>> >>> > We have 6 million such documents and we are using solr cloud(6.6) on
>> a 6
>> >>> > node cluster with 8 Vcores and 24 GB RAM each.
>> >>> >
>> >>> > On the same set of hardware in elastic search we were getting the
>> >>> response
>> >>> > in about 10ms whereas in solr we are getting response in around
>> 300-400
>> >>> ms.
>> >>> >
>> >>> > This is how I am querying the data.
>> >>> >
>> >>> > private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
>> >>> > List<String> metrics) {
>> >>> >     SolrQuery query = new SolrQuery();
>> >>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
>> >>> g).collect
>> >>> >             (Collectors.joining(" OR ")) + ")";
>> >>> >     String finalQuery = "variable1:" + variable1 + " AND " +
>> groupQuery;
>> >>> >     query.set("q", finalQuery);
>> >>> >     query.setRows(0);
>> >>> >     metrics.forEach(
>> >>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
>> >>> metric)
>> >>> >     );
>> >>> >     return query;
>> >>> > }
>> >>> >
>> >>> > Any help will be appreciated regarding this.
>> >>> >
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > Raunak
>> >>>
>> >>
>> >>
>>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by Joel Bernstein <jo...@gmail.com>.
The JSON facet API is a different aggregation implementation. So, it will
definitely have different performance characteristics.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Dec 12, 2017 at 9:17 AM, RAUNAK AGRAWAL <ag...@gmail.com>
wrote:

> Hi Yonik,
>
> So if the query is fine then I guess even using JSON Facet API will not
> help me here. Can you suggest me some other idea or further tuning which
> will help me in reducing the latency?
>
> On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
> > OK great, so it's definitely not the main query (which is just a
> > single term query in this example!)
> >
> > > Also I have looked into the JSON Facet API. If I have to use facets, I
> > will
> > > have to then define 3600 facets in a single query and I guess that
> would
> > be
> > > also slow.
> >
> > You can ask for any number of stats for a given facet (even the root
> > facet bucket w/o faceting on any fields):
> >
> > cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
> > 290&rows=0&json.facet={
> >   s1:"sum(metric_1)",
> >   s2:"sum(metric_2)",
> >   s3:"sum(metric_3)"
> > }'
> >
> > -Yonik
> >
> >
> > On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
> > <ag...@gmail.com> wrote:
> > > Hi Yonik,
> > >
> > > As you asked here is the code snippet and the actual solr query. Please
> > > have a look. I have included only 104 metrics but like this we can go
> > upto
> > > 3600 rollups.
> > >
> > > Also I have looked into the JSON Facet API. If I have to use facets, I
> > will
> > > have to then define 3600 facets in a single query and I guess that
> would
> > be
> > > also slow. Also is there any max limit on the number of facets we can
> > > define in a single query?
> > >
> > > Code snippet:
> > >
> > > private SolrQuery buildQuery(Integer variable1, List<String> metrics) {
> > >     SolrQuery query = new SolrQuery();
> > >     query.set("q", "variable1:" + variable1);
> > >     query.setRows(0);
> > >     metrics.forEach(
> > >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> > metric)
> > >     );
> > >     return query;
> > > }
> > >
> > >
> > > The generated query:
> > >
> > > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> > > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> > > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> > > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> > > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> > > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> > > }metric_11' stats.field='{!sum=true }metric_12'
> > > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> > > }metric_14' stats.field='{!sum=true }metric_15'
> > > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> > > }metric_17' stats.field='{!sum=true }metric_18'
> > > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> > > }metric_20' stats.field='{!sum=true }metric_21'
> > > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> > > }metric_23' stats.field='{!sum=true }metric_24'
> > > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> > > }metric_26' stats.field='{!sum=true }metric_27'
> > > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> > > }metric_29' stats.field='{!sum=true }metric_30'
> > > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> > > }metric_32' stats.field='{!sum=true }metric_33'
> > > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> > > }metric_35' stats.field='{!sum=true }metric_36'
> > > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> > > }metric_38' stats.field='{!sum=true }metric_39'
> > > stats.field='{!sum=true }metric_40' stats.field='{!sum=true
> > > }metric_41' stats.field='{!sum=true }metric_42'
> > > stats.field='{!sum=true }metric_43' stats.field='{!sum=true
> > > }metric_44' stats.field='{!sum=true }metric_45'
> > > stats.field='{!sum=true }metric_46' stats.field='{!sum=true
> > > }metric_47' stats.field='{!sum=true }metric_48'
> > > stats.field='{!sum=true }metric_49' stats.field='{!sum=true
> > > }metric_50' stats.field='{!sum=true }metric_51'
> > > stats.field='{!sum=true }metric_52' stats.field='{!sum=true
> > > }metric_53' stats.field='{!sum=true }metric_54'
> > > stats.field='{!sum=true }metric_55' stats.field='{!sum=true
> > > }metric_56' stats.field='{!sum=true }metric_57'
> > > stats.field='{!sum=true }metric_58' stats.field='{!sum=true
> > > }metric_59' stats.field='{!sum=true }metric_60'
> > > stats.field='{!sum=true }metric_61' stats.field='{!sum=true
> > > }metric_62' stats.field='{!sum=true }metric_63'
> > > stats.field='{!sum=true }metric_64' stats.field='{!sum=true
> > > }metric_65' stats.field='{!sum=true }metric_66'
> > > stats.field='{!sum=true }metric_67' stats.field='{!sum=true
> > > }metric_68' stats.field='{!sum=true }metric_69'
> > > stats.field='{!sum=true }metric_70' stats.field='{!sum=true
> > > }metric_71' stats.field='{!sum=true }metric_72'
> > > stats.field='{!sum=true }metric_73' stats.field='{!sum=true
> > > }metric_74' stats.field='{!sum=true }metric_75'
> > > stats.field='{!sum=true }metric_76' stats.field='{!sum=true
> > > }metric_77' stats.field='{!sum=true }metric_78'
> > > stats.field='{!sum=true }metric_79' stats.field='{!sum=true
> > > }metric_80' stats.field='{!sum=true }metric_81'
> > > stats.field='{!sum=true }metric_82' stats.field='{!sum=true
> > > }metric_83' stats.field='{!sum=true }metric_84'
> > > stats.field='{!sum=true }metric_85' stats.field='{!sum=true
> > > }metric_86' stats.field='{!sum=true }metric_87'
> > > stats.field='{!sum=true }metric_88' stats.field='{!sum=true
> > > }metric_89' stats.field='{!sum=true }metric_90'
> > > stats.field='{!sum=true }metric_91' stats.field='{!sum=true
> > > }metric_92' stats.field='{!sum=true }metric_93'
> > > stats.field='{!sum=true }metric_94' stats.field='{!sum=true
> > > }metric_95' stats.field='{!sum=true }metric_96'
> > > stats.field='{!sum=true }metric_97' stats.field='{!sum=true
> > > }metric_98' stats.field='{!sum=true }metric_99'
> > > stats.field='{!sum=true }metric_100' stats.field='{!sum=true
> > > }metric_101' stats.field='{!sum=true }metric_102'
> > > stats.field='{!sum=true }metric_103' stats.field='{!sum=true
> > > }metric_104'}
> > >
> > >
> > >
> > >
> > > On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <
> > agrawal.raunak@gmail.com>
> > > wrote:
> > >
> > >> Hi Yonik,
> > >>
> > >> I will try the JSON Facet API and update here but my hunch is that
> > >> querying mechanism is not the problem. Rather the problem lies with
> the
> > >> solr aggregations.
> > >>
> > >> Thanks
> > >>
> > >> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com>
> > wrote:
> > >>
> > >>> I think the SolrJ below uses the old stats component.
> > >>> Hopefully the JSON Facet API would be faster for this, but it's not
> > >>> completely clear what the main query here looks like, and if it's the
> > >>> source of any bottleneck rather than the aggregations.
> > >>> What does the generated query string actually look like (it may be
> > >>> easiest just to pull this from the logs).
> > >>>
> > >>> -Yonik
> > >>>
> > >>>
> > >>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
> > >>> <ag...@gmail.com> wrote:
> > >>> > Hi,
> > >>> >
> > >>> > We have a use case where there are 4-5 dimensions and around 3500
> > >>> metrics
> > >>> > in a single document. We have indexed only 2 dimensions and made
> all
> > the
> > >>> > metrics as doc_values so that we can run the aggregation queries.
> > >>> >
> > >>> > We have 6 million such documents and we are using solr cloud(6.6)
> on
> > a 6
> > >>> > node cluster with 8 Vcores and 24 GB RAM each.
> > >>> >
> > >>> > On the same set of hardware in elastic search we were getting the
> > >>> response
> > >>> > in about 10ms whereas in solr we are getting response in around
> > 300-400
> > >>> ms.
> > >>> >
> > >>> > This is how I am querying the data.
> > >>> >
> > >>> > private SolrQuery buildQuery(Integer variable1, List<Integer>
> groups,
> > >>> > List<String> metrics) {
> > >>> >     SolrQuery query = new SolrQuery();
> > >>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
> > >>> g).collect
> > >>> >             (Collectors.joining(" OR ")) + ")";
> > >>> >     String finalQuery = "variable1:" + variable1 + " AND " +
> > groupQuery;
> > >>> >     query.set("q", finalQuery);
> > >>> >     query.setRows(0);
> > >>> >     metrics.forEach(
> > >>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> > >>> metric)
> > >>> >     );
> > >>> >     return query;
> > >>> > }
> > >>> >
> > >>> > Any help will be appreciated regarding this.
> > >>> >
> > >>> >
> > >>> > Thanks,
> > >>> >
> > >>> > Raunak
> > >>>
> > >>
> > >>
> >
>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by RAUNAK AGRAWAL <ag...@gmail.com>.
Hi Yonik,

So if the query is fine then I guess even using JSON Facet API will not
help me here. Can you suggest me some other idea or further tuning which
will help me in reducing the latency?

On Tue, Dec 12, 2017 at 7:27 PM, Yonik Seeley <ys...@gmail.com> wrote:

> OK great, so it's definitely not the main query (which is just a
> single term query in this example!)
>
> > Also I have looked into the JSON Facet API. If I have to use facets, I
> will
> > have to then define 3600 facets in a single query and I guess that would
> be
> > also slow.
>
> You can ask for any number of stats for a given facet (even the root
> facet bucket w/o faceting on any fields):
>
> cutl 'http://localhost:8983/solr/collection1.query?q=variable1:
> 290&rows=0&json.facet={
>   s1:"sum(metric_1)",
>   s2:"sum(metric_2)",
>   s3:"sum(metric_3)"
> }'
>
> -Yonik
>
>
> On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
> <ag...@gmail.com> wrote:
> > Hi Yonik,
> >
> > As you asked here is the code snippet and the actual solr query. Please
> > have a look. I have included only 104 metrics but like this we can go
> upto
> > 3600 rollups.
> >
> > Also I have looked into the JSON Facet API. If I have to use facets, I
> will
> > have to then define 3600 facets in a single query and I guess that would
> be
> > also slow. Also is there any max limit on the number of facets we can
> > define in a single query?
> >
> > Code snippet:
> >
> > private SolrQuery buildQuery(Integer variable1, List<String> metrics) {
> >     SolrQuery query = new SolrQuery();
> >     query.set("q", "variable1:" + variable1);
> >     query.setRows(0);
> >     metrics.forEach(
> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> metric)
> >     );
> >     return query;
> > }
> >
> >
> > The generated query:
> >
> > {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> > }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> > }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> > }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> > }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> > }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> > }metric_11' stats.field='{!sum=true }metric_12'
> > stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> > }metric_14' stats.field='{!sum=true }metric_15'
> > stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> > }metric_17' stats.field='{!sum=true }metric_18'
> > stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> > }metric_20' stats.field='{!sum=true }metric_21'
> > stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> > }metric_23' stats.field='{!sum=true }metric_24'
> > stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> > }metric_26' stats.field='{!sum=true }metric_27'
> > stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> > }metric_29' stats.field='{!sum=true }metric_30'
> > stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> > }metric_32' stats.field='{!sum=true }metric_33'
> > stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> > }metric_35' stats.field='{!sum=true }metric_36'
> > stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> > }metric_38' stats.field='{!sum=true }metric_39'
> > stats.field='{!sum=true }metric_40' stats.field='{!sum=true
> > }metric_41' stats.field='{!sum=true }metric_42'
> > stats.field='{!sum=true }metric_43' stats.field='{!sum=true
> > }metric_44' stats.field='{!sum=true }metric_45'
> > stats.field='{!sum=true }metric_46' stats.field='{!sum=true
> > }metric_47' stats.field='{!sum=true }metric_48'
> > stats.field='{!sum=true }metric_49' stats.field='{!sum=true
> > }metric_50' stats.field='{!sum=true }metric_51'
> > stats.field='{!sum=true }metric_52' stats.field='{!sum=true
> > }metric_53' stats.field='{!sum=true }metric_54'
> > stats.field='{!sum=true }metric_55' stats.field='{!sum=true
> > }metric_56' stats.field='{!sum=true }metric_57'
> > stats.field='{!sum=true }metric_58' stats.field='{!sum=true
> > }metric_59' stats.field='{!sum=true }metric_60'
> > stats.field='{!sum=true }metric_61' stats.field='{!sum=true
> > }metric_62' stats.field='{!sum=true }metric_63'
> > stats.field='{!sum=true }metric_64' stats.field='{!sum=true
> > }metric_65' stats.field='{!sum=true }metric_66'
> > stats.field='{!sum=true }metric_67' stats.field='{!sum=true
> > }metric_68' stats.field='{!sum=true }metric_69'
> > stats.field='{!sum=true }metric_70' stats.field='{!sum=true
> > }metric_71' stats.field='{!sum=true }metric_72'
> > stats.field='{!sum=true }metric_73' stats.field='{!sum=true
> > }metric_74' stats.field='{!sum=true }metric_75'
> > stats.field='{!sum=true }metric_76' stats.field='{!sum=true
> > }metric_77' stats.field='{!sum=true }metric_78'
> > stats.field='{!sum=true }metric_79' stats.field='{!sum=true
> > }metric_80' stats.field='{!sum=true }metric_81'
> > stats.field='{!sum=true }metric_82' stats.field='{!sum=true
> > }metric_83' stats.field='{!sum=true }metric_84'
> > stats.field='{!sum=true }metric_85' stats.field='{!sum=true
> > }metric_86' stats.field='{!sum=true }metric_87'
> > stats.field='{!sum=true }metric_88' stats.field='{!sum=true
> > }metric_89' stats.field='{!sum=true }metric_90'
> > stats.field='{!sum=true }metric_91' stats.field='{!sum=true
> > }metric_92' stats.field='{!sum=true }metric_93'
> > stats.field='{!sum=true }metric_94' stats.field='{!sum=true
> > }metric_95' stats.field='{!sum=true }metric_96'
> > stats.field='{!sum=true }metric_97' stats.field='{!sum=true
> > }metric_98' stats.field='{!sum=true }metric_99'
> > stats.field='{!sum=true }metric_100' stats.field='{!sum=true
> > }metric_101' stats.field='{!sum=true }metric_102'
> > stats.field='{!sum=true }metric_103' stats.field='{!sum=true
> > }metric_104'}
> >
> >
> >
> >
> > On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <
> agrawal.raunak@gmail.com>
> > wrote:
> >
> >> Hi Yonik,
> >>
> >> I will try the JSON Facet API and update here but my hunch is that
> >> querying mechanism is not the problem. Rather the problem lies with the
> >> solr aggregations.
> >>
> >> Thanks
> >>
> >> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com>
> wrote:
> >>
> >>> I think the SolrJ below uses the old stats component.
> >>> Hopefully the JSON Facet API would be faster for this, but it's not
> >>> completely clear what the main query here looks like, and if it's the
> >>> source of any bottleneck rather than the aggregations.
> >>> What does the generated query string actually look like (it may be
> >>> easiest just to pull this from the logs).
> >>>
> >>> -Yonik
> >>>
> >>>
> >>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
> >>> <ag...@gmail.com> wrote:
> >>> > Hi,
> >>> >
> >>> > We have a use case where there are 4-5 dimensions and around 3500
> >>> metrics
> >>> > in a single document. We have indexed only 2 dimensions and made all
> the
> >>> > metrics as doc_values so that we can run the aggregation queries.
> >>> >
> >>> > We have 6 million such documents and we are using solr cloud(6.6) on
> a 6
> >>> > node cluster with 8 Vcores and 24 GB RAM each.
> >>> >
> >>> > On the same set of hardware in elastic search we were getting the
> >>> response
> >>> > in about 10ms whereas in solr we are getting response in around
> 300-400
> >>> ms.
> >>> >
> >>> > This is how I am querying the data.
> >>> >
> >>> > private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
> >>> > List<String> metrics) {
> >>> >     SolrQuery query = new SolrQuery();
> >>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
> >>> g).collect
> >>> >             (Collectors.joining(" OR ")) + ")";
> >>> >     String finalQuery = "variable1:" + variable1 + " AND " +
> groupQuery;
> >>> >     query.set("q", finalQuery);
> >>> >     query.setRows(0);
> >>> >     metrics.forEach(
> >>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> >>> metric)
> >>> >     );
> >>> >     return query;
> >>> > }
> >>> >
> >>> > Any help will be appreciated regarding this.
> >>> >
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Raunak
> >>>
> >>
> >>
>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by Yonik Seeley <ys...@gmail.com>.
OK great, so it's definitely not the main query (which is just a
single term query in this example!)

> Also I have looked into the JSON Facet API. If I have to use facets, I will
> have to then define 3600 facets in a single query and I guess that would be
> also slow.

You can ask for any number of stats for a given facet (even the root
facet bucket w/o faceting on any fields):

cutl 'http://localhost:8983/solr/collection1.query?q=variable1:290&rows=0&json.facet={
  s1:"sum(metric_1)",
  s2:"sum(metric_2)",
  s3:"sum(metric_3)"
}'

-Yonik


On Tue, Dec 12, 2017 at 5:40 AM, RAUNAK AGRAWAL
<ag...@gmail.com> wrote:
> Hi Yonik,
>
> As you asked here is the code snippet and the actual solr query. Please
> have a look. I have included only 104 metrics but like this we can go upto
> 3600 rollups.
>
> Also I have looked into the JSON Facet API. If I have to use facets, I will
> have to then define 3600 facets in a single query and I guess that would be
> also slow. Also is there any max limit on the number of facets we can
> define in a single query?
>
> Code snippet:
>
> private SolrQuery buildQuery(Integer variable1, List<String> metrics) {
>     SolrQuery query = new SolrQuery();
>     query.set("q", "variable1:" + variable1);
>     query.setRows(0);
>     metrics.forEach(
>             metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
>     );
>     return query;
> }
>
>
> The generated query:
>
> {! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
> }metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
> }metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
> }metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
> }metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
> }metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
> }metric_11' stats.field='{!sum=true }metric_12'
> stats.field='{!sum=true }metric_13' stats.field='{!sum=true
> }metric_14' stats.field='{!sum=true }metric_15'
> stats.field='{!sum=true }metric_16' stats.field='{!sum=true
> }metric_17' stats.field='{!sum=true }metric_18'
> stats.field='{!sum=true }metric_19' stats.field='{!sum=true
> }metric_20' stats.field='{!sum=true }metric_21'
> stats.field='{!sum=true }metric_22' stats.field='{!sum=true
> }metric_23' stats.field='{!sum=true }metric_24'
> stats.field='{!sum=true }metric_25' stats.field='{!sum=true
> }metric_26' stats.field='{!sum=true }metric_27'
> stats.field='{!sum=true }metric_28' stats.field='{!sum=true
> }metric_29' stats.field='{!sum=true }metric_30'
> stats.field='{!sum=true }metric_31' stats.field='{!sum=true
> }metric_32' stats.field='{!sum=true }metric_33'
> stats.field='{!sum=true }metric_34' stats.field='{!sum=true
> }metric_35' stats.field='{!sum=true }metric_36'
> stats.field='{!sum=true }metric_37' stats.field='{!sum=true
> }metric_38' stats.field='{!sum=true }metric_39'
> stats.field='{!sum=true }metric_40' stats.field='{!sum=true
> }metric_41' stats.field='{!sum=true }metric_42'
> stats.field='{!sum=true }metric_43' stats.field='{!sum=true
> }metric_44' stats.field='{!sum=true }metric_45'
> stats.field='{!sum=true }metric_46' stats.field='{!sum=true
> }metric_47' stats.field='{!sum=true }metric_48'
> stats.field='{!sum=true }metric_49' stats.field='{!sum=true
> }metric_50' stats.field='{!sum=true }metric_51'
> stats.field='{!sum=true }metric_52' stats.field='{!sum=true
> }metric_53' stats.field='{!sum=true }metric_54'
> stats.field='{!sum=true }metric_55' stats.field='{!sum=true
> }metric_56' stats.field='{!sum=true }metric_57'
> stats.field='{!sum=true }metric_58' stats.field='{!sum=true
> }metric_59' stats.field='{!sum=true }metric_60'
> stats.field='{!sum=true }metric_61' stats.field='{!sum=true
> }metric_62' stats.field='{!sum=true }metric_63'
> stats.field='{!sum=true }metric_64' stats.field='{!sum=true
> }metric_65' stats.field='{!sum=true }metric_66'
> stats.field='{!sum=true }metric_67' stats.field='{!sum=true
> }metric_68' stats.field='{!sum=true }metric_69'
> stats.field='{!sum=true }metric_70' stats.field='{!sum=true
> }metric_71' stats.field='{!sum=true }metric_72'
> stats.field='{!sum=true }metric_73' stats.field='{!sum=true
> }metric_74' stats.field='{!sum=true }metric_75'
> stats.field='{!sum=true }metric_76' stats.field='{!sum=true
> }metric_77' stats.field='{!sum=true }metric_78'
> stats.field='{!sum=true }metric_79' stats.field='{!sum=true
> }metric_80' stats.field='{!sum=true }metric_81'
> stats.field='{!sum=true }metric_82' stats.field='{!sum=true
> }metric_83' stats.field='{!sum=true }metric_84'
> stats.field='{!sum=true }metric_85' stats.field='{!sum=true
> }metric_86' stats.field='{!sum=true }metric_87'
> stats.field='{!sum=true }metric_88' stats.field='{!sum=true
> }metric_89' stats.field='{!sum=true }metric_90'
> stats.field='{!sum=true }metric_91' stats.field='{!sum=true
> }metric_92' stats.field='{!sum=true }metric_93'
> stats.field='{!sum=true }metric_94' stats.field='{!sum=true
> }metric_95' stats.field='{!sum=true }metric_96'
> stats.field='{!sum=true }metric_97' stats.field='{!sum=true
> }metric_98' stats.field='{!sum=true }metric_99'
> stats.field='{!sum=true }metric_100' stats.field='{!sum=true
> }metric_101' stats.field='{!sum=true }metric_102'
> stats.field='{!sum=true }metric_103' stats.field='{!sum=true
> }metric_104'}
>
>
>
>
> On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <ag...@gmail.com>
> wrote:
>
>> Hi Yonik,
>>
>> I will try the JSON Facet API and update here but my hunch is that
>> querying mechanism is not the problem. Rather the problem lies with the
>> solr aggregations.
>>
>> Thanks
>>
>> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com> wrote:
>>
>>> I think the SolrJ below uses the old stats component.
>>> Hopefully the JSON Facet API would be faster for this, but it's not
>>> completely clear what the main query here looks like, and if it's the
>>> source of any bottleneck rather than the aggregations.
>>> What does the generated query string actually look like (it may be
>>> easiest just to pull this from the logs).
>>>
>>> -Yonik
>>>
>>>
>>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
>>> <ag...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > We have a use case where there are 4-5 dimensions and around 3500
>>> metrics
>>> > in a single document. We have indexed only 2 dimensions and made all the
>>> > metrics as doc_values so that we can run the aggregation queries.
>>> >
>>> > We have 6 million such documents and we are using solr cloud(6.6) on a 6
>>> > node cluster with 8 Vcores and 24 GB RAM each.
>>> >
>>> > On the same set of hardware in elastic search we were getting the
>>> response
>>> > in about 10ms whereas in solr we are getting response in around 300-400
>>> ms.
>>> >
>>> > This is how I am querying the data.
>>> >
>>> > private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
>>> > List<String> metrics) {
>>> >     SolrQuery query = new SolrQuery();
>>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
>>> g).collect
>>> >             (Collectors.joining(" OR ")) + ")";
>>> >     String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
>>> >     query.set("q", finalQuery);
>>> >     query.setRows(0);
>>> >     metrics.forEach(
>>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
>>> metric)
>>> >     );
>>> >     return query;
>>> > }
>>> >
>>> > Any help will be appreciated regarding this.
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Raunak
>>>
>>
>>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by RAUNAK AGRAWAL <ag...@gmail.com>.
Hi Yonik,

As you asked here is the code snippet and the actual solr query. Please
have a look. I have included only 104 metrics but like this we can go upto
3600 rollups.

Also I have looked into the JSON Facet API. If I have to use facets, I will
have to then define 3600 facets in a single query and I guess that would be
also slow. Also is there any max limit on the number of facets we can
define in a single query?

Code snippet:

private SolrQuery buildQuery(Integer variable1, List<String> metrics) {
    SolrQuery query = new SolrQuery();
    query.set("q", "variable1:" + variable1);
    query.setRows(0);
    metrics.forEach(
            metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
    );
    return query;
}


The generated query:

{! q=variable1:290 rows=0 stats=true stats.field='{!sum=true
}metric_1' stats.field='{!sum=true }metric_2' stats.field='{!sum=true
}metric_3' stats.field='{!sum=true }metric_4' stats.field='{!sum=true
}metric_5' stats.field='{!sum=true }metric_6' stats.field='{!sum=true
}metric_7' stats.field='{!sum=true }metric_8' stats.field='{!sum=true
}metric_9' stats.field='{!sum=true }metric_10' stats.field='{!sum=true
}metric_11' stats.field='{!sum=true }metric_12'
stats.field='{!sum=true }metric_13' stats.field='{!sum=true
}metric_14' stats.field='{!sum=true }metric_15'
stats.field='{!sum=true }metric_16' stats.field='{!sum=true
}metric_17' stats.field='{!sum=true }metric_18'
stats.field='{!sum=true }metric_19' stats.field='{!sum=true
}metric_20' stats.field='{!sum=true }metric_21'
stats.field='{!sum=true }metric_22' stats.field='{!sum=true
}metric_23' stats.field='{!sum=true }metric_24'
stats.field='{!sum=true }metric_25' stats.field='{!sum=true
}metric_26' stats.field='{!sum=true }metric_27'
stats.field='{!sum=true }metric_28' stats.field='{!sum=true
}metric_29' stats.field='{!sum=true }metric_30'
stats.field='{!sum=true }metric_31' stats.field='{!sum=true
}metric_32' stats.field='{!sum=true }metric_33'
stats.field='{!sum=true }metric_34' stats.field='{!sum=true
}metric_35' stats.field='{!sum=true }metric_36'
stats.field='{!sum=true }metric_37' stats.field='{!sum=true
}metric_38' stats.field='{!sum=true }metric_39'
stats.field='{!sum=true }metric_40' stats.field='{!sum=true
}metric_41' stats.field='{!sum=true }metric_42'
stats.field='{!sum=true }metric_43' stats.field='{!sum=true
}metric_44' stats.field='{!sum=true }metric_45'
stats.field='{!sum=true }metric_46' stats.field='{!sum=true
}metric_47' stats.field='{!sum=true }metric_48'
stats.field='{!sum=true }metric_49' stats.field='{!sum=true
}metric_50' stats.field='{!sum=true }metric_51'
stats.field='{!sum=true }metric_52' stats.field='{!sum=true
}metric_53' stats.field='{!sum=true }metric_54'
stats.field='{!sum=true }metric_55' stats.field='{!sum=true
}metric_56' stats.field='{!sum=true }metric_57'
stats.field='{!sum=true }metric_58' stats.field='{!sum=true
}metric_59' stats.field='{!sum=true }metric_60'
stats.field='{!sum=true }metric_61' stats.field='{!sum=true
}metric_62' stats.field='{!sum=true }metric_63'
stats.field='{!sum=true }metric_64' stats.field='{!sum=true
}metric_65' stats.field='{!sum=true }metric_66'
stats.field='{!sum=true }metric_67' stats.field='{!sum=true
}metric_68' stats.field='{!sum=true }metric_69'
stats.field='{!sum=true }metric_70' stats.field='{!sum=true
}metric_71' stats.field='{!sum=true }metric_72'
stats.field='{!sum=true }metric_73' stats.field='{!sum=true
}metric_74' stats.field='{!sum=true }metric_75'
stats.field='{!sum=true }metric_76' stats.field='{!sum=true
}metric_77' stats.field='{!sum=true }metric_78'
stats.field='{!sum=true }metric_79' stats.field='{!sum=true
}metric_80' stats.field='{!sum=true }metric_81'
stats.field='{!sum=true }metric_82' stats.field='{!sum=true
}metric_83' stats.field='{!sum=true }metric_84'
stats.field='{!sum=true }metric_85' stats.field='{!sum=true
}metric_86' stats.field='{!sum=true }metric_87'
stats.field='{!sum=true }metric_88' stats.field='{!sum=true
}metric_89' stats.field='{!sum=true }metric_90'
stats.field='{!sum=true }metric_91' stats.field='{!sum=true
}metric_92' stats.field='{!sum=true }metric_93'
stats.field='{!sum=true }metric_94' stats.field='{!sum=true
}metric_95' stats.field='{!sum=true }metric_96'
stats.field='{!sum=true }metric_97' stats.field='{!sum=true
}metric_98' stats.field='{!sum=true }metric_99'
stats.field='{!sum=true }metric_100' stats.field='{!sum=true
}metric_101' stats.field='{!sum=true }metric_102'
stats.field='{!sum=true }metric_103' stats.field='{!sum=true
}metric_104'}




On Tue, Dec 12, 2017 at 10:21 AM, RAUNAK AGRAWAL <ag...@gmail.com>
wrote:

> Hi Yonik,
>
> I will try the JSON Facet API and update here but my hunch is that
> querying mechanism is not the problem. Rather the problem lies with the
> solr aggregations.
>
> Thanks
>
> On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com> wrote:
>
>> I think the SolrJ below uses the old stats component.
>> Hopefully the JSON Facet API would be faster for this, but it's not
>> completely clear what the main query here looks like, and if it's the
>> source of any bottleneck rather than the aggregations.
>> What does the generated query string actually look like (it may be
>> easiest just to pull this from the logs).
>>
>> -Yonik
>>
>>
>> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
>> <ag...@gmail.com> wrote:
>> > Hi,
>> >
>> > We have a use case where there are 4-5 dimensions and around 3500
>> metrics
>> > in a single document. We have indexed only 2 dimensions and made all the
>> > metrics as doc_values so that we can run the aggregation queries.
>> >
>> > We have 6 million such documents and we are using solr cloud(6.6) on a 6
>> > node cluster with 8 Vcores and 24 GB RAM each.
>> >
>> > On the same set of hardware in elastic search we were getting the
>> response
>> > in about 10ms whereas in solr we are getting response in around 300-400
>> ms.
>> >
>> > This is how I am querying the data.
>> >
>> > private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
>> > List<String> metrics) {
>> >     SolrQuery query = new SolrQuery();
>> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
>> g).collect
>> >             (Collectors.joining(" OR ")) + ")";
>> >     String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
>> >     query.set("q", finalQuery);
>> >     query.setRows(0);
>> >     metrics.forEach(
>> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
>> metric)
>> >     );
>> >     return query;
>> > }
>> >
>> > Any help will be appreciated regarding this.
>> >
>> >
>> > Thanks,
>> >
>> > Raunak
>>
>
>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by RAUNAK AGRAWAL <ag...@gmail.com>.
Hi Yonik,

I will try the JSON Facet API and update here but my hunch is that querying
mechanism is not the problem. Rather the problem lies with the solr
aggregations.

Thanks

On Tue, Dec 12, 2017 at 6:31 AM, Yonik Seeley <ys...@gmail.com> wrote:

> I think the SolrJ below uses the old stats component.
> Hopefully the JSON Facet API would be faster for this, but it's not
> completely clear what the main query here looks like, and if it's the
> source of any bottleneck rather than the aggregations.
> What does the generated query string actually look like (it may be
> easiest just to pull this from the logs).
>
> -Yonik
>
>
> On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
> <ag...@gmail.com> wrote:
> > Hi,
> >
> > We have a use case where there are 4-5 dimensions and around 3500 metrics
> > in a single document. We have indexed only 2 dimensions and made all the
> > metrics as doc_values so that we can run the aggregation queries.
> >
> > We have 6 million such documents and we are using solr cloud(6.6) on a 6
> > node cluster with 8 Vcores and 24 GB RAM each.
> >
> > On the same set of hardware in elastic search we were getting the
> response
> > in about 10ms whereas in solr we are getting response in around 300-400
> ms.
> >
> > This is how I am querying the data.
> >
> > private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
> > List<String> metrics) {
> >     SolrQuery query = new SolrQuery();
> >     String groupQuery = " (" + groups.stream().map(g -> "group:" +
> g).collect
> >             (Collectors.joining(" OR ")) + ")";
> >     String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
> >     query.set("q", finalQuery);
> >     query.setRows(0);
> >     metrics.forEach(
> >             metric -> query.setGetFieldStatistics("{!sum=true }" +
> metric)
> >     );
> >     return query;
> > }
> >
> > Any help will be appreciated regarding this.
> >
> >
> > Thanks,
> >
> > Raunak
>

Re: Solr Aggregation queries are way slower than Elastic Search

Posted by Yonik Seeley <ys...@gmail.com>.
I think the SolrJ below uses the old stats component.
Hopefully the JSON Facet API would be faster for this, but it's not
completely clear what the main query here looks like, and if it's the
source of any bottleneck rather than the aggregations.
What does the generated query string actually look like (it may be
easiest just to pull this from the logs).

-Yonik


On Mon, Dec 11, 2017 at 7:32 PM, RAUNAK AGRAWAL
<ag...@gmail.com> wrote:
> Hi,
>
> We have a use case where there are 4-5 dimensions and around 3500 metrics
> in a single document. We have indexed only 2 dimensions and made all the
> metrics as doc_values so that we can run the aggregation queries.
>
> We have 6 million such documents and we are using solr cloud(6.6) on a 6
> node cluster with 8 Vcores and 24 GB RAM each.
>
> On the same set of hardware in elastic search we were getting the response
> in about 10ms whereas in solr we are getting response in around 300-400 ms.
>
> This is how I am querying the data.
>
> private SolrQuery buildQuery(Integer variable1, List<Integer> groups,
> List<String> metrics) {
>     SolrQuery query = new SolrQuery();
>     String groupQuery = " (" + groups.stream().map(g -> "group:" + g).collect
>             (Collectors.joining(" OR ")) + ")";
>     String finalQuery = "variable1:" + variable1 + " AND " + groupQuery;
>     query.set("q", finalQuery);
>     query.setRows(0);
>     metrics.forEach(
>             metric -> query.setGetFieldStatistics("{!sum=true }" + metric)
>     );
>     return query;
> }
>
> Any help will be appreciated regarding this.
>
>
> Thanks,
>
> Raunak