You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Roshni Rajagopal <ro...@gmail.com> on 2016/09/08 00:31:09 UTC

Solr Grouping, Aggregations and Custom Functions

Hi Solr Gurus,

       I have these requirements

1. Need to group data in solr on multiple fields and compute agregations
like SUM (field)

2. Need to compute some custom calculations - sum(field1)/sum(field2) on
the grouped data.

Options Ive tried

1. Group- this does not allow to group by more than 1 field, and
aggregations are not supported

2. Stats - this along with facet.pivot gets results for basic group
aggregations like SUM. Custom Calculation is not supported. Also the format
is messy with stats getting calculated at every level. Cannot paginate.

2. Facet JSON API -gets results for basic group aggregations like SUM.
Format is less messy and we can paginate. Custom Calculation like
DIV(sum(field1), sum(field2)) is still not supported.

So the last resort is /sql handler for parallel queries. Is tested and
stable, and will it meet my requirements? Im on solr 6.10.

Or would you recommend adding Spark…I would prefer to handle all
requirements in solr, as I dont want to maintain another moving part of
Spark.

Do advise!

Regards

Roshni

Re: Solr Grouping, Aggregations and Custom Functions

Posted by Roshni <ro...@gmail.com>.
Hi Joel, Thanks for responding.

   For full fledged data analytics powered by solr, group by and
aggregations are needed. The basic aggregations are available- but we often
have calculated fields like the one I mentioned sum (a)/sum(b). It will be
cool to have these in solr. Such calculations cannot be perisisted in raw
data because it depends on the filters and first set of aggregations of sum 

1. When do you think this support may be available..in 6.x..?
2. As of today what are my options, if I still want to use solr- would using
Spark or Zeppelin over Solr help me with this custom calculation? Or perhaps
I can use java to retrieve the grouped sums from Solr API and then do the
custom calculation on the fly. That may slow it down. Which approach would
you recommend.




Parallel SQL only supports the following functions currently: (SUM, AVG,
MIN, MAX, COUNT).

More functions and compound functions are on the roadmap.

Joel Bernstein
http://joelsolr.blogspot.com/





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Grouping-Aggregations-and-Custom-Functions-tp4295093p4295181.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Grouping, Aggregations and Custom Functions

Posted by Praveen Babu <su...@gmail.com>.
Hi Joel Bernstein,

Thanks for the update .If you guys get chance to provide that feature soon,
it will be more benefit to the solr users.


Regards,
S.Praveen
Technical Architech
LinkedIn:
https://www.linkedin.com/in/praveen-babu-73232889?trk=nav_responsive_tab_profile




On Thu, Sep 8, 2016 at 5:30 PM, Joel Bernstein <jo...@gmail.com> wrote:

> Parallel SQL only supports the following functions currently: (SUM, AVG,
> MIN, MAX, COUNT).
>
> More functions and compound functions are on the roadmap.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Sep 8, 2016 at 12:11 AM, Praveen Babu <su...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I am also new to Solr and I have gone through Solr document and tested
> agg
> > using Solr- Prasto ( Parallel sql), Stream.
> >
> > I am getting very good response using these 2 technologies. But my
> worries
> > are, unable to Group By Multivalue field which Solr standard api does but
> > not latest version of solr-prasto/Stream.
> >
> > I want to aggregate/Group by  "app.name" field using stream/ Parallel
> sql.
> > Please suggest.
> >
> > input:
> >
> > {
> >
> > id: 1
> >
> > field1:[1,2,3],
> >
> > app.name:[watsapp,facebook,... ]
> >
> > }
> >
> > {
> >
> > id: 2
> >
> > field1:[1,2,3],
> >
> > app.name:[watsapp,facebook,... ]
> >
> > }
> >
> >
> >
> > Expected result :
> >
> > watsapp: 2
> >
> > facebook : 2
> >
> >
> > I have 2 TB data . I wanted to execute in aggmode=map_reduce. Any
> > suggestion?
> >
> >
> >
> > Regards,
> > S.Praveen
> > Technical Architech
> > LinkedIn:
> > https://www.linkedin.com/in/praveen-babu-73232889?trk=nav_
> > responsive_tab_profile
> >
> >
> >
> >
> > On Thu, Sep 8, 2016 at 6:01 AM, Roshni Rajagopal <ro...@gmail.com>
> > wrote:
> >
> > > Hi Solr Gurus,
> > >
> > >        I have these requirements
> > >
> > > 1. Need to group data in solr on multiple fields and compute
> agregations
> > > like SUM (field)
> > >
> > > 2. Need to compute some custom calculations - sum(field1)/sum(field2)
> on
> > > the grouped data.
> > >
> > > Options Ive tried
> > >
> > > 1. Group- this does not allow to group by more than 1 field, and
> > > aggregations are not supported
> > >
> > > 2. Stats - this along with facet.pivot gets results for basic group
> > > aggregations like SUM. Custom Calculation is not supported. Also the
> > format
> > > is messy with stats getting calculated at every level. Cannot paginate.
> > >
> > > 2. Facet JSON API -gets results for basic group aggregations like SUM.
> > > Format is less messy and we can paginate. Custom Calculation like
> > > DIV(sum(field1), sum(field2)) is still not supported.
> > >
> > > So the last resort is /sql handler for parallel queries. Is tested and
> > > stable, and will it meet my requirements? Im on solr 6.10.
> > >
> > > Or would you recommend adding Spark…I would prefer to handle all
> > > requirements in solr, as I dont want to maintain another moving part of
> > > Spark.
> > >
> > > Do advise!
> > >
> > > Regards
> > >
> > > Roshni
> > >
> >
>

Re: Solr Grouping, Aggregations and Custom Functions

Posted by Joel Bernstein <jo...@gmail.com>.
Parallel SQL only supports the following functions currently: (SUM, AVG,
MIN, MAX, COUNT).

More functions and compound functions are on the roadmap.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Sep 8, 2016 at 12:11 AM, Praveen Babu <su...@gmail.com>
wrote:

> Hi All,
>
> I am also new to Solr and I have gone through Solr document and tested agg
> using Solr- Prasto ( Parallel sql), Stream.
>
> I am getting very good response using these 2 technologies. But my worries
> are, unable to Group By Multivalue field which Solr standard api does but
> not latest version of solr-prasto/Stream.
>
> I want to aggregate/Group by  "app.name" field using stream/ Parallel sql.
> Please suggest.
>
> input:
>
> {
>
> id: 1
>
> field1:[1,2,3],
>
> app.name:[watsapp,facebook,... ]
>
> }
>
> {
>
> id: 2
>
> field1:[1,2,3],
>
> app.name:[watsapp,facebook,... ]
>
> }
>
>
>
> Expected result :
>
> watsapp: 2
>
> facebook : 2
>
>
> I have 2 TB data . I wanted to execute in aggmode=map_reduce. Any
> suggestion?
>
>
>
> Regards,
> S.Praveen
> Technical Architech
> LinkedIn:
> https://www.linkedin.com/in/praveen-babu-73232889?trk=nav_
> responsive_tab_profile
>
>
>
>
> On Thu, Sep 8, 2016 at 6:01 AM, Roshni Rajagopal <ro...@gmail.com>
> wrote:
>
> > Hi Solr Gurus,
> >
> >        I have these requirements
> >
> > 1. Need to group data in solr on multiple fields and compute agregations
> > like SUM (field)
> >
> > 2. Need to compute some custom calculations - sum(field1)/sum(field2) on
> > the grouped data.
> >
> > Options Ive tried
> >
> > 1. Group- this does not allow to group by more than 1 field, and
> > aggregations are not supported
> >
> > 2. Stats - this along with facet.pivot gets results for basic group
> > aggregations like SUM. Custom Calculation is not supported. Also the
> format
> > is messy with stats getting calculated at every level. Cannot paginate.
> >
> > 2. Facet JSON API -gets results for basic group aggregations like SUM.
> > Format is less messy and we can paginate. Custom Calculation like
> > DIV(sum(field1), sum(field2)) is still not supported.
> >
> > So the last resort is /sql handler for parallel queries. Is tested and
> > stable, and will it meet my requirements? Im on solr 6.10.
> >
> > Or would you recommend adding Spark…I would prefer to handle all
> > requirements in solr, as I dont want to maintain another moving part of
> > Spark.
> >
> > Do advise!
> >
> > Regards
> >
> > Roshni
> >
>

Re: Solr Grouping, Aggregations and Custom Functions

Posted by Praveen Babu <su...@gmail.com>.
Hi All,

I am also new to Solr and I have gone through Solr document and tested agg
using Solr- Prasto ( Parallel sql), Stream.

I am getting very good response using these 2 technologies. But my worries
are, unable to Group By Multivalue field which Solr standard api does but
not latest version of solr-prasto/Stream.

I want to aggregate/Group by  "app.name" field using stream/ Parallel sql.
Please suggest.

input:

{

id: 1

field1:[1,2,3],

app.name:[watsapp,facebook,... ]

}

{

id: 2

field1:[1,2,3],

app.name:[watsapp,facebook,... ]

}



Expected result :

watsapp: 2

facebook : 2


I have 2 TB data . I wanted to execute in aggmode=map_reduce. Any
suggestion?



Regards,
S.Praveen
Technical Architech
LinkedIn:
https://www.linkedin.com/in/praveen-babu-73232889?trk=nav_responsive_tab_profile




On Thu, Sep 8, 2016 at 6:01 AM, Roshni Rajagopal <ro...@gmail.com>
wrote:

> Hi Solr Gurus,
>
>        I have these requirements
>
> 1. Need to group data in solr on multiple fields and compute agregations
> like SUM (field)
>
> 2. Need to compute some custom calculations - sum(field1)/sum(field2) on
> the grouped data.
>
> Options Ive tried
>
> 1. Group- this does not allow to group by more than 1 field, and
> aggregations are not supported
>
> 2. Stats - this along with facet.pivot gets results for basic group
> aggregations like SUM. Custom Calculation is not supported. Also the format
> is messy with stats getting calculated at every level. Cannot paginate.
>
> 2. Facet JSON API -gets results for basic group aggregations like SUM.
> Format is less messy and we can paginate. Custom Calculation like
> DIV(sum(field1), sum(field2)) is still not supported.
>
> So the last resort is /sql handler for parallel queries. Is tested and
> stable, and will it meet my requirements? Im on solr 6.10.
>
> Or would you recommend adding Spark…I would prefer to handle all
> requirements in solr, as I dont want to maintain another moving part of
> Spark.
>
> Do advise!
>
> Regards
>
> Roshni
>