You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ryan Josal <ry...@josal.com> on 2015/01/27 01:34:16 UTC

An interesting approach to grouping

I have an index of products, and these products have a "category" which we
can say for now is a good approximation of its location in the store.  I'm
investigating altering the ordering of the results so that the categories
aren't interlaced as much... so that the results are a little bit more
grouped by category, but not *totally* grouped by category.  It's
interesting because it's an approach that sort of compares results to
near-scored/ranked results.  One of the hoped outcomes of this would that
there would be somewhat fewer categories represented in the top results for
a given query, although it is questionable if this is a good measurement to
determine the effectiveness of the implementation.

My first attempt was to
group=true&group.main=true&group.field=category&group.func=rint(scale(query({!type=edismax
v=$q}),0,20))

Or some FunctionQuery like that, so that in order to become a member of a
group, the doc would have to have the same category, and be dropped into
the same score bucket (20 in this case).  This doesn't work out of the gate
due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway):

java.lang.NullPointerException\n\tat
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValues(ScaleFloatFunction.java:104)\n\tat
org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.java:1111)\n\tat
org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollector.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:113)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\tat
org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n\tat
org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:459)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)\n\tat


Has anyone tried something like this before, and does anyone have any novel
ideas for how to approach it, no matter how different?  How about a
workaround for the group.func error here?  I'm very open-minded about where
to go on this one.

Thanks,
Ryan

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

Here’s the issue:


On 1/27/15, 12:44 PM, "Ryan Josal" <rj...@gmail.com> wrote:

>This is great, thanks Jim.  Your patch worked and the sorting solution
>meets the goal, although group.limit seems like it could cut various
>results out of the middle of the result set.  I will play around with it
>and see if it proves helpful.  Can you let me know the Jira so I can keep
>an eye on it?
>
>Ryan
>
>On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:
>
>> Interestingly, you can do something like this:
>>
>> group=true&
>> group.main=true&
>> group.func=rint(scale(query({!type=edismax v=$q}),0,20))& // puts into
>> buckets
>> group.limit=20& // gives you 20 from each bucket
>> group.sort=category asc  // this will sort by category within each
>>bucket,
>> but this can be a function as well.
>>
>>
>>
>> Jim Musil
>>
>>
>>
>> On 1/27/15, 10:14 AM, "Jim.Musil" <Jim.Musil@target.com <javascript:;>>
>> wrote:
>>
>> >When using group.main=true, the results are not mixed as you expect:
>> >
>> >"If true, the result of the last field grouping command is used as the
>> >main result list in the response, using group.format=simple”
>> >
>> >https://wiki.apache.org/solr/FieldCollapsing
>> >
>> >
>> >Jim
>> >
>> >On 1/27/15, 9:22 AM, "Ryan Josal" <rjosal@gmail.com <javascript:;>>
>> wrote:
>> >
>> >>Thanks a lot!  I'll try this out later this morning.  If group.func
>>and
>> >>group.field don't combine the way I think they might, I'll try to look
>> >>for
>> >>a way to put it all in group.func.
>> >>
>> >>On Tuesday, January 27, 2015, Jim.Musil <Jim.Musil@target.com
>> <javascript:;>> wrote:
>> >>
>> >>> I¹m not sure the query you provided will do what you want, BUT I did
>> >>>find
>> >>> the bug in the code that is causing the NullPointerException.
>> >>>
>> >>> The variable context is supposed to be global, but when prepare() is
>> >>> called, it is only defined in the scope of that function.
>> >>>
>> >>> Here¹s the simple patch:
>> >>>
>> >>> Index: core/src/java/org/apache/solr/search/Grouping.java
>> >>> ===================================================================
>> >>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
>> >>>1653358)
>> >>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working
>>copy)
>> >>> @@ -926,7 +926,7 @@
>> >>>       */
>> >>>      @Override
>> >>>      protected void prepare() throws IOException {
>> >>> -      Map context = ValueSource.newContext(searcher);
>> >>> +      context = ValueSource.newContext(searcher);
>> >>>        groupBy.createWeight(context, searcher);
>> >>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>> >>>      }
>> >>>
>> >>>
>> >>> I¹ll search for a Jira issue and open if I can¹t find one.
>> >>>
>> >>> Jim Musil
>> >>>
>> >>>
>> >>>
>> >>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>
>> <javascript:;>>
>> >>>wrote:
>> >>>
>> >>> >I have an index of products, and these products have a "category"
>> >>>which we
>> >>> >can say for now is a good approximation of its location in the
>>store.
>> >>>I'm
>> >>> >investigating altering the ordering of the results so that the
>> >>>categories
>> >>> >aren't interlaced as much... so that the results are a little bit
>>more
>> >>> >grouped by category, but not *totally* grouped by category.  It's
>> >>> >interesting because it's an approach that sort of compares results
>>to
>> >>> >near-scored/ranked results.  One of the hoped outcomes of this
>>would
>> >>>that
>> >>> >there would be somewhat fewer categories represented in the top
>> >>>results
>> >>> >for
>> >>> >a given query, although it is questionable if this is a good
>> >>>measurement
>> >>> >to
>> >>> >determine the effectiveness of the implementation.
>> >>> >
>> >>> >My first attempt was to
>> >>>
>> 
>>>>>>group=true&group.main=true&group.field=category&group.func=rint(scale
>>>>>>(q
>> >>>>u
>> >>>>er
>> >>> >y({!type=edismax
>> >>> >v=$q}),0,20))
>> >>> >
>> >>> >Or some FunctionQuery like that, so that in order to become a
>>member
>> >>>of a
>> >>> >group, the doc would have to have the same category, and be dropped
>> >>>into
>> >>> >the same score bucket (20 in this case).  This doesn't work out of
>>the
>> >>> >gate
>> >>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
>> >>>anyway):
>> >>> >
>> >>> >java.lang.NullPointerException\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
>>>>>>Va
>> >>>>l
>> >>>>ue
>> >>> >s(ScaleFloatFunction.java:104)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
>>>>>>se
>> >>>>r
>> >>>>.j
>> >>> >ava:1111)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
>>>>>>ol
>> >>>>l
>> >>>>ec
>> >>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
>>>>>>ja
>> >>>>v
>> >>>>a:
>> >>> >113)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
>>>>>>51
>> >>>>)
>> >>>>\n
>> >>> >\tat
>> >>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.QueryComponent.process(QueryCompone
>>>>>>nt
>> >>>>.
>> >>>>ja
>> >>> >va:459)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
>>>>>>rc
>> >>>>h
>> >>>>Ha
>> >>> >ndler.java:218)\n\tat
>> >>> >
>> >>> >
>> >>> >Has anyone tried something like this before, and does anyone have
>>any
>> >>> >novel
>> >>> >ideas for how to approach it, no matter how different?  How about a
>> >>> >workaround for the group.func error here?  I'm very open-minded
>>about
>> >>> >where
>> >>> >to go on this one.
>> >>> >
>> >>> >Thanks,
>> >>> >Ryan
>> >>>
>> >>>
>> >
>>
>>

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

Here’s the issue:

https://issues.apache.org/jira/browse/SOLR-7046


Jim

On 1/27/15, 12:44 PM, "Ryan Josal" <rj...@gmail.com> wrote:

>This is great, thanks Jim.  Your patch worked and the sorting solution
>meets the goal, although group.limit seems like it could cut various
>results out of the middle of the result set.  I will play around with it
>and see if it proves helpful.  Can you let me know the Jira so I can keep
>an eye on it?
>
>Ryan
>
>On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:
>
>> Interestingly, you can do something like this:
>>
>> group=true&
>> group.main=true&
>> group.func=rint(scale(query({!type=edismax v=$q}),0,20))& // puts into
>> buckets
>> group.limit=20& // gives you 20 from each bucket
>> group.sort=category asc  // this will sort by category within each
>>bucket,
>> but this can be a function as well.
>>
>>
>>
>> Jim Musil
>>
>>
>>
>> On 1/27/15, 10:14 AM, "Jim.Musil" <Jim.Musil@target.com <javascript:;>>
>> wrote:
>>
>> >When using group.main=true, the results are not mixed as you expect:
>> >
>> >"If true, the result of the last field grouping command is used as the
>> >main result list in the response, using group.format=simple”
>> >
>> >https://wiki.apache.org/solr/FieldCollapsing
>> >
>> >
>> >Jim
>> >
>> >On 1/27/15, 9:22 AM, "Ryan Josal" <rjosal@gmail.com <javascript:;>>
>> wrote:
>> >
>> >>Thanks a lot!  I'll try this out later this morning.  If group.func
>>and
>> >>group.field don't combine the way I think they might, I'll try to look
>> >>for
>> >>a way to put it all in group.func.
>> >>
>> >>On Tuesday, January 27, 2015, Jim.Musil <Jim.Musil@target.com
>> <javascript:;>> wrote:
>> >>
>> >>> I¹m not sure the query you provided will do what you want, BUT I did
>> >>>find
>> >>> the bug in the code that is causing the NullPointerException.
>> >>>
>> >>> The variable context is supposed to be global, but when prepare() is
>> >>> called, it is only defined in the scope of that function.
>> >>>
>> >>> Here¹s the simple patch:
>> >>>
>> >>> Index: core/src/java/org/apache/solr/search/Grouping.java
>> >>> ===================================================================
>> >>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
>> >>>1653358)
>> >>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working
>>copy)
>> >>> @@ -926,7 +926,7 @@
>> >>>       */
>> >>>      @Override
>> >>>      protected void prepare() throws IOException {
>> >>> -      Map context = ValueSource.newContext(searcher);
>> >>> +      context = ValueSource.newContext(searcher);
>> >>>        groupBy.createWeight(context, searcher);
>> >>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>> >>>      }
>> >>>
>> >>>
>> >>> I¹ll search for a Jira issue and open if I can¹t find one.
>> >>>
>> >>> Jim Musil
>> >>>
>> >>>
>> >>>
>> >>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>
>> <javascript:;>>
>> >>>wrote:
>> >>>
>> >>> >I have an index of products, and these products have a "category"
>> >>>which we
>> >>> >can say for now is a good approximation of its location in the
>>store.
>> >>>I'm
>> >>> >investigating altering the ordering of the results so that the
>> >>>categories
>> >>> >aren't interlaced as much... so that the results are a little bit
>>more
>> >>> >grouped by category, but not *totally* grouped by category.  It's
>> >>> >interesting because it's an approach that sort of compares results
>>to
>> >>> >near-scored/ranked results.  One of the hoped outcomes of this
>>would
>> >>>that
>> >>> >there would be somewhat fewer categories represented in the top
>> >>>results
>> >>> >for
>> >>> >a given query, although it is questionable if this is a good
>> >>>measurement
>> >>> >to
>> >>> >determine the effectiveness of the implementation.
>> >>> >
>> >>> >My first attempt was to
>> >>>
>> 
>>>>>>group=true&group.main=true&group.field=category&group.func=rint(scale
>>>>>>(q
>> >>>>u
>> >>>>er
>> >>> >y({!type=edismax
>> >>> >v=$q}),0,20))
>> >>> >
>> >>> >Or some FunctionQuery like that, so that in order to become a
>>member
>> >>>of a
>> >>> >group, the doc would have to have the same category, and be dropped
>> >>>into
>> >>> >the same score bucket (20 in this case).  This doesn't work out of
>>the
>> >>> >gate
>> >>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
>> >>>anyway):
>> >>> >
>> >>> >java.lang.NullPointerException\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
>>>>>>Va
>> >>>>l
>> >>>>ue
>> >>> >s(ScaleFloatFunction.java:104)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
>>>>>>se
>> >>>>r
>> >>>>.j
>> >>> >ava:1111)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
>>>>>>ol
>> >>>>l
>> >>>>ec
>> >>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
>>>>>>ja
>> >>>>v
>> >>>>a:
>> >>> >113)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
>>>>>>51
>> >>>>)
>> >>>>\n
>> >>> >\tat
>> >>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.QueryComponent.process(QueryCompone
>>>>>>nt
>> >>>>.
>> >>>>ja
>> >>> >va:459)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
>>>>>>rc
>> >>>>h
>> >>>>Ha
>> >>> >ndler.java:218)\n\tat
>> >>> >
>> >>> >
>> >>> >Has anyone tried something like this before, and does anyone have
>>any
>> >>> >novel
>> >>> >ideas for how to approach it, no matter how different?  How about a
>> >>> >workaround for the group.func error here?  I'm very open-minded
>>about
>> >>> >where
>> >>> >to go on this one.
>> >>> >
>> >>> >Thanks,
>> >>> >Ryan
>> >>>
>> >>>
>> >
>>
>>

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

Yes, I’m trying to pin down exactly what conditions cause the bug to
appear. It seems as though it’s only when using the query function.

Jim

On 1/27/15, 12:44 PM, "Ryan Josal" <rj...@gmail.com> wrote:

>This is great, thanks Jim.  Your patch worked and the sorting solution
>meets the goal, although group.limit seems like it could cut various
>results out of the middle of the result set.  I will play around with it
>and see if it proves helpful.  Can you let me know the Jira so I can keep
>an eye on it?
>
>Ryan
>
>On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:
>
>> Interestingly, you can do something like this:
>>
>> group=true&
>> group.main=true&
>> group.func=rint(scale(query({!type=edismax v=$q}),0,20))& // puts into
>> buckets
>> group.limit=20& // gives you 20 from each bucket
>> group.sort=category asc  // this will sort by category within each
>>bucket,
>> but this can be a function as well.
>>
>>
>>
>> Jim Musil
>>
>>
>>
>> On 1/27/15, 10:14 AM, "Jim.Musil" <Jim.Musil@target.com <javascript:;>>
>> wrote:
>>
>> >When using group.main=true, the results are not mixed as you expect:
>> >
>> >"If true, the result of the last field grouping command is used as the
>> >main result list in the response, using group.format=simple”
>> >
>> >https://wiki.apache.org/solr/FieldCollapsing
>> >
>> >
>> >Jim
>> >
>> >On 1/27/15, 9:22 AM, "Ryan Josal" <rjosal@gmail.com <javascript:;>>
>> wrote:
>> >
>> >>Thanks a lot!  I'll try this out later this morning.  If group.func
>>and
>> >>group.field don't combine the way I think they might, I'll try to look
>> >>for
>> >>a way to put it all in group.func.
>> >>
>> >>On Tuesday, January 27, 2015, Jim.Musil <Jim.Musil@target.com
>> <javascript:;>> wrote:
>> >>
>> >>> I¹m not sure the query you provided will do what you want, BUT I did
>> >>>find
>> >>> the bug in the code that is causing the NullPointerException.
>> >>>
>> >>> The variable context is supposed to be global, but when prepare() is
>> >>> called, it is only defined in the scope of that function.
>> >>>
>> >>> Here¹s the simple patch:
>> >>>
>> >>> Index: core/src/java/org/apache/solr/search/Grouping.java
>> >>> ===================================================================
>> >>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
>> >>>1653358)
>> >>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working
>>copy)
>> >>> @@ -926,7 +926,7 @@
>> >>>       */
>> >>>      @Override
>> >>>      protected void prepare() throws IOException {
>> >>> -      Map context = ValueSource.newContext(searcher);
>> >>> +      context = ValueSource.newContext(searcher);
>> >>>        groupBy.createWeight(context, searcher);
>> >>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>> >>>      }
>> >>>
>> >>>
>> >>> I¹ll search for a Jira issue and open if I can¹t find one.
>> >>>
>> >>> Jim Musil
>> >>>
>> >>>
>> >>>
>> >>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>
>> <javascript:;>>
>> >>>wrote:
>> >>>
>> >>> >I have an index of products, and these products have a "category"
>> >>>which we
>> >>> >can say for now is a good approximation of its location in the
>>store.
>> >>>I'm
>> >>> >investigating altering the ordering of the results so that the
>> >>>categories
>> >>> >aren't interlaced as much... so that the results are a little bit
>>more
>> >>> >grouped by category, but not *totally* grouped by category.  It's
>> >>> >interesting because it's an approach that sort of compares results
>>to
>> >>> >near-scored/ranked results.  One of the hoped outcomes of this
>>would
>> >>>that
>> >>> >there would be somewhat fewer categories represented in the top
>> >>>results
>> >>> >for
>> >>> >a given query, although it is questionable if this is a good
>> >>>measurement
>> >>> >to
>> >>> >determine the effectiveness of the implementation.
>> >>> >
>> >>> >My first attempt was to
>> >>>
>> 
>>>>>>group=true&group.main=true&group.field=category&group.func=rint(scale
>>>>>>(q
>> >>>>u
>> >>>>er
>> >>> >y({!type=edismax
>> >>> >v=$q}),0,20))
>> >>> >
>> >>> >Or some FunctionQuery like that, so that in order to become a
>>member
>> >>>of a
>> >>> >group, the doc would have to have the same category, and be dropped
>> >>>into
>> >>> >the same score bucket (20 in this case).  This doesn't work out of
>>the
>> >>> >gate
>> >>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
>> >>>anyway):
>> >>> >
>> >>> >java.lang.NullPointerException\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get
>>>>>>Va
>> >>>>l
>> >>>>ue
>> >>> >s(ScaleFloatFunction.java:104)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar
>>>>>>se
>> >>>>r
>> >>>>.j
>> >>> >ava:1111)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC
>>>>>>ol
>> >>>>l
>> >>>>ec
>> >>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
>>>>>>ja
>> >>>>v
>> >>>>a:
>> >>> >113)\n\tat
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>>>>>>\n
>> >>>>\
>> >>>>ta
>> >>> >t
>> >>>
>> 
>>>>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4
>>>>>>51
>> >>>>)
>> >>>>\n
>> >>> >\tat
>> >>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.QueryComponent.process(QueryCompone
>>>>>>nt
>> >>>>.
>> >>>>ja
>> >>> >va:459)\n\tat
>> >>>
>> 
>>>>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
>>>>>>rc
>> >>>>h
>> >>>>Ha
>> >>> >ndler.java:218)\n\tat
>> >>> >
>> >>> >
>> >>> >Has anyone tried something like this before, and does anyone have
>>any
>> >>> >novel
>> >>> >ideas for how to approach it, no matter how different?  How about a
>> >>> >workaround for the group.func error here?  I'm very open-minded
>>about
>> >>> >where
>> >>> >to go on this one.
>> >>> >
>> >>> >Thanks,
>> >>> >Ryan
>> >>>
>> >>>
>> >
>>
>>

Re: An interesting approach to grouping

Posted by Ryan Josal <rj...@gmail.com>.

This is great, thanks Jim.  Your patch worked and the sorting solution
meets the goal, although group.limit seems like it could cut various
results out of the middle of the result set.  I will play around with it
and see if it proves helpful.  Can you let me know the Jira so I can keep
an eye on it?

Ryan

On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:

> Interestingly, you can do something like this:
>
> group=true&
> group.main=true&
> group.func=rint(scale(query({!type=edismax v=$q}),0,20))& // puts into
> buckets
> group.limit=20& // gives you 20 from each bucket
> group.sort=category asc  // this will sort by category within each bucket,
> but this can be a function as well.
>
>
>
> Jim Musil
>
>
>
> On 1/27/15, 10:14 AM, "Jim.Musil" <Jim.Musil@target.com <javascript:;>>
> wrote:
>
> >When using group.main=true, the results are not mixed as you expect:
> >
> >"If true, the result of the last field grouping command is used as the
> >main result list in the response, using group.format=simple”
> >
> >https://wiki.apache.org/solr/FieldCollapsing
> >
> >
> >Jim
> >
> >On 1/27/15, 9:22 AM, "Ryan Josal" <rjosal@gmail.com <javascript:;>>
> wrote:
> >
> >>Thanks a lot!  I'll try this out later this morning.  If group.func and
> >>group.field don't combine the way I think they might, I'll try to look
> >>for
> >>a way to put it all in group.func.
> >>
> >>On Tuesday, January 27, 2015, Jim.Musil <Jim.Musil@target.com
> <javascript:;>> wrote:
> >>
> >>> I¹m not sure the query you provided will do what you want, BUT I did
> >>>find
> >>> the bug in the code that is causing the NullPointerException.
> >>>
> >>> The variable context is supposed to be global, but when prepare() is
> >>> called, it is only defined in the scope of that function.
> >>>
> >>> Here¹s the simple patch:
> >>>
> >>> Index: core/src/java/org/apache/solr/search/Grouping.java
> >>> ===================================================================
> >>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
> >>>1653358)
> >>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
> >>> @@ -926,7 +926,7 @@
> >>>       */
> >>>      @Override
> >>>      protected void prepare() throws IOException {
> >>> -      Map context = ValueSource.newContext(searcher);
> >>> +      context = ValueSource.newContext(searcher);
> >>>        groupBy.createWeight(context, searcher);
> >>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
> >>>      }
> >>>
> >>>
> >>> I¹ll search for a Jira issue and open if I can¹t find one.
> >>>
> >>> Jim Musil
> >>>
> >>>
> >>>
> >>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>
> <javascript:;>>
> >>>wrote:
> >>>
> >>> >I have an index of products, and these products have a "category"
> >>>which we
> >>> >can say for now is a good approximation of its location in the store.
> >>>I'm
> >>> >investigating altering the ordering of the results so that the
> >>>categories
> >>> >aren't interlaced as much... so that the results are a little bit more
> >>> >grouped by category, but not *totally* grouped by category.  It's
> >>> >interesting because it's an approach that sort of compares results to
> >>> >near-scored/ranked results.  One of the hoped outcomes of this would
> >>>that
> >>> >there would be somewhat fewer categories represented in the top
> >>>results
> >>> >for
> >>> >a given query, although it is questionable if this is a good
> >>>measurement
> >>> >to
> >>> >determine the effectiveness of the implementation.
> >>> >
> >>> >My first attempt was to
> >>>
> >>>>group=true&group.main=true&group.field=category&group.func=rint(scale(q
> >>>>u
> >>>>er
> >>> >y({!type=edismax
> >>> >v=$q}),0,20))
> >>> >
> >>> >Or some FunctionQuery like that, so that in order to become a member
> >>>of a
> >>> >group, the doc would have to have the same category, and be dropped
> >>>into
> >>> >the same score bucket (20 in this case).  This doesn't work out of the
> >>> >gate
> >>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
> >>>anyway):
> >>> >
> >>> >java.lang.NullPointerException\n\tat
> >>>
> >>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVa
> >>>>l
> >>>>ue
> >>> >s(ScaleFloatFunction.java:104)\n\tat
> >>>
> >>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParse
> >>>>r
> >>>>.j
> >>> >ava:1111)\n\tat
> >>>
> >>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCol
> >>>>l
> >>>>ec
> >>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
> >>>
> >>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.ja
> >>>>v
> >>>>a:
> >>> >113)\n\tat
> >>>
> >>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n
> >>>>\
> >>>>ta
> >>> >t
> >>>
> >>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n
> >>>>\
> >>>>ta
> >>> >t
> >>>
> >>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451
> >>>>)
> >>>>\n
> >>> >\tat
> >>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
> >>>
> >>>>org.apache.solr.handler.component.QueryComponent.process(QueryComponent
> >>>>.
> >>>>ja
> >>> >va:459)\n\tat
> >>>
> >>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Searc
> >>>>h
> >>>>Ha
> >>> >ndler.java:218)\n\tat
> >>> >
> >>> >
> >>> >Has anyone tried something like this before, and does anyone have any
> >>> >novel
> >>> >ideas for how to approach it, no matter how different?  How about a
> >>> >workaround for the group.func error here?  I'm very open-minded about
> >>> >where
> >>> >to go on this one.
> >>> >
> >>> >Thanks,
> >>> >Ryan
> >>>
> >>>
> >
>
>

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

Interestingly, you can do something like this:

group=true&
group.main=true&
group.func=rint(scale(query({!type=edismax v=$q}),0,20))& // puts into
buckets
group.limit=20& // gives you 20 from each bucket
group.sort=category asc  // this will sort by category within each bucket,
but this can be a function as well.



Jim Musil



On 1/27/15, 10:14 AM, "Jim.Musil" <Ji...@target.com> wrote:

>When using group.main=true, the results are not mixed as you expect:
>
>"If true, the result of the last field grouping command is used as the
>main result list in the response, using group.format=simple”
>
>https://wiki.apache.org/solr/FieldCollapsing
>
>
>Jim
>
>On 1/27/15, 9:22 AM, "Ryan Josal" <rj...@gmail.com> wrote:
>
>>Thanks a lot!  I'll try this out later this morning.  If group.func and
>>group.field don't combine the way I think they might, I'll try to look
>>for
>>a way to put it all in group.func.
>>
>>On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:
>>
>>> I¹m not sure the query you provided will do what you want, BUT I did
>>>find
>>> the bug in the code that is causing the NullPointerException.
>>>
>>> The variable context is supposed to be global, but when prepare() is
>>> called, it is only defined in the scope of that function.
>>>
>>> Here¹s the simple patch:
>>>
>>> Index: core/src/java/org/apache/solr/search/Grouping.java
>>> ===================================================================
>>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
>>>1653358)
>>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
>>> @@ -926,7 +926,7 @@
>>>       */
>>>      @Override
>>>      protected void prepare() throws IOException {
>>> -      Map context = ValueSource.newContext(searcher);
>>> +      context = ValueSource.newContext(searcher);
>>>        groupBy.createWeight(context, searcher);
>>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>>>      }
>>>
>>>
>>> I¹ll search for a Jira issue and open if I can¹t find one.
>>>
>>> Jim Musil
>>>
>>>
>>>
>>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>>
>>>wrote:
>>>
>>> >I have an index of products, and these products have a "category"
>>>which we
>>> >can say for now is a good approximation of its location in the store.
>>>I'm
>>> >investigating altering the ordering of the results so that the
>>>categories
>>> >aren't interlaced as much... so that the results are a little bit more
>>> >grouped by category, but not *totally* grouped by category.  It's
>>> >interesting because it's an approach that sort of compares results to
>>> >near-scored/ranked results.  One of the hoped outcomes of this would
>>>that
>>> >there would be somewhat fewer categories represented in the top
>>>results
>>> >for
>>> >a given query, although it is questionable if this is a good
>>>measurement
>>> >to
>>> >determine the effectiveness of the implementation.
>>> >
>>> >My first attempt was to
>>> 
>>>>group=true&group.main=true&group.field=category&group.func=rint(scale(q
>>>>u
>>>>er
>>> >y({!type=edismax
>>> >v=$q}),0,20))
>>> >
>>> >Or some FunctionQuery like that, so that in order to become a member
>>>of a
>>> >group, the doc would have to have the same category, and be dropped
>>>into
>>> >the same score bucket (20 in this case).  This doesn't work out of the
>>> >gate
>>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
>>>anyway):
>>> >
>>> >java.lang.NullPointerException\n\tat
>>> 
>>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVa
>>>>l
>>>>ue
>>> >s(ScaleFloatFunction.java:104)\n\tat
>>> 
>>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParse
>>>>r
>>>>.j
>>> >ava:1111)\n\tat
>>> 
>>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCol
>>>>l
>>>>ec
>>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>>> 
>>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.ja
>>>>v
>>>>a:
>>> >113)\n\tat
>>> 
>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n
>>>>\
>>>>ta
>>> >t
>>> 
>>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n
>>>>\
>>>>ta
>>> >t
>>> 
>>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451
>>>>)
>>>>\n
>>> >\tat
>>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>>> 
>>>>org.apache.solr.handler.component.QueryComponent.process(QueryComponent
>>>>.
>>>>ja
>>> >va:459)\n\tat
>>> 
>>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Searc
>>>>h
>>>>Ha
>>> >ndler.java:218)\n\tat
>>> >
>>> >
>>> >Has anyone tried something like this before, and does anyone have any
>>> >novel
>>> >ideas for how to approach it, no matter how different?  How about a
>>> >workaround for the group.func error here?  I'm very open-minded about
>>> >where
>>> >to go on this one.
>>> >
>>> >Thanks,
>>> >Ryan
>>>
>>>
>

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

When using group.main=true, the results are not mixed as you expect:

"If true, the result of the last field grouping command is used as the
main result list in the response, using group.format=simple”

https://wiki.apache.org/solr/FieldCollapsing


Jim

On 1/27/15, 9:22 AM, "Ryan Josal" <rj...@gmail.com> wrote:

>Thanks a lot!  I'll try this out later this morning.  If group.func and
>group.field don't combine the way I think they might, I'll try to look for
>a way to put it all in group.func.
>
>On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:
>
>> I¹m not sure the query you provided will do what you want, BUT I did
>>find
>> the bug in the code that is causing the NullPointerException.
>>
>> The variable context is supposed to be global, but when prepare() is
>> called, it is only defined in the scope of that function.
>>
>> Here¹s the simple patch:
>>
>> Index: core/src/java/org/apache/solr/search/Grouping.java
>> ===================================================================
>> --- core/src/java/org/apache/solr/search/Grouping.java  (revision
>>1653358)
>> +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
>> @@ -926,7 +926,7 @@
>>       */
>>      @Override
>>      protected void prepare() throws IOException {
>> -      Map context = ValueSource.newContext(searcher);
>> +      context = ValueSource.newContext(searcher);
>>        groupBy.createWeight(context, searcher);
>>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>>      }
>>
>>
>> I¹ll search for a Jira issue and open if I can¹t find one.
>>
>> Jim Musil
>>
>>
>>
>> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>> wrote:
>>
>> >I have an index of products, and these products have a "category"
>>which we
>> >can say for now is a good approximation of its location in the store.
>>I'm
>> >investigating altering the ordering of the results so that the
>>categories
>> >aren't interlaced as much... so that the results are a little bit more
>> >grouped by category, but not *totally* grouped by category.  It's
>> >interesting because it's an approach that sort of compares results to
>> >near-scored/ranked results.  One of the hoped outcomes of this would
>>that
>> >there would be somewhat fewer categories represented in the top results
>> >for
>> >a given query, although it is questionable if this is a good
>>measurement
>> >to
>> >determine the effectiveness of the implementation.
>> >
>> >My first attempt was to
>> 
>>>group=true&group.main=true&group.field=category&group.func=rint(scale(qu
>>>er
>> >y({!type=edismax
>> >v=$q}),0,20))
>> >
>> >Or some FunctionQuery like that, so that in order to become a member
>>of a
>> >group, the doc would have to have the same category, and be dropped
>>into
>> >the same score bucket (20 in this case).  This doesn't work out of the
>> >gate
>> >due to an NPE (solr 4.10.2) (although I'm not sure it would work
>>anyway):
>> >
>> >java.lang.NullPointerException\n\tat
>> 
>>>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVal
>>>ue
>> >s(ScaleFloatFunction.java:104)\n\tat
>> 
>>>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser
>>>.j
>> >ava:1111)\n\tat
>> 
>>>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingColl
>>>ec
>> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>> 
>>>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.jav
>>>a:
>> >113)\n\tat
>> 
>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\
>>>ta
>> >t
>> 
>>>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\
>>>ta
>> >t
>> 
>>>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)
>>>\n
>> >\tat
>> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>> 
>>>org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
>>>ja
>> >va:459)\n\tat
>> 
>>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
>>>Ha
>> >ndler.java:218)\n\tat
>> >
>> >
>> >Has anyone tried something like this before, and does anyone have any
>> >novel
>> >ideas for how to approach it, no matter how different?  How about a
>> >workaround for the group.func error here?  I'm very open-minded about
>> >where
>> >to go on this one.
>> >
>> >Thanks,
>> >Ryan
>>
>>

Re: An interesting approach to grouping

Posted by Ryan Josal <rj...@gmail.com>.

Thanks a lot!  I'll try this out later this morning.  If group.func and
group.field don't combine the way I think they might, I'll try to look for
a way to put it all in group.func.

On Tuesday, January 27, 2015, Jim.Musil <Ji...@target.com> wrote:

> I¹m not sure the query you provided will do what you want, BUT I did find
> the bug in the code that is causing the NullPointerException.
>
> The variable context is supposed to be global, but when prepare() is
> called, it is only defined in the scope of that function.
>
> Here¹s the simple patch:
>
> Index: core/src/java/org/apache/solr/search/Grouping.java
> ===================================================================
> --- core/src/java/org/apache/solr/search/Grouping.java  (revision 1653358)
> +++ core/src/java/org/apache/solr/search/Grouping.java  (working copy)
> @@ -926,7 +926,7 @@
>       */
>      @Override
>      protected void prepare() throws IOException {
> -      Map context = ValueSource.newContext(searcher);
> +      context = ValueSource.newContext(searcher);
>        groupBy.createWeight(context, searcher);
>        actualGroupsToFind = getMax(offset, numGroups, maxDoc);
>      }
>
>
> I¹ll search for a Jira issue and open if I can¹t find one.
>
> Jim Musil
>
>
>
> On 1/26/15, 6:34 PM, "Ryan Josal" <ryan@josal.com <javascript:;>> wrote:
>
> >I have an index of products, and these products have a "category" which we
> >can say for now is a good approximation of its location in the store.  I'm
> >investigating altering the ordering of the results so that the categories
> >aren't interlaced as much... so that the results are a little bit more
> >grouped by category, but not *totally* grouped by category.  It's
> >interesting because it's an approach that sort of compares results to
> >near-scored/ranked results.  One of the hoped outcomes of this would that
> >there would be somewhat fewer categories represented in the top results
> >for
> >a given query, although it is questionable if this is a good measurement
> >to
> >determine the effectiveness of the implementation.
> >
> >My first attempt was to
> >group=true&group.main=true&group.field=category&group.func=rint(scale(quer
> >y({!type=edismax
> >v=$q}),0,20))
> >
> >Or some FunctionQuery like that, so that in order to become a member of a
> >group, the doc would have to have the same category, and be dropped into
> >the same score bucket (20 in this case).  This doesn't work out of the
> >gate
> >due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway):
> >
> >java.lang.NullPointerException\n\tat
> >org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValue
> >s(ScaleFloatFunction.java:104)\n\tat
> >org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.j
> >ava:1111)\n\tat
> >org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollec
> >tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
> >org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:
> >113)\n\tat
> >org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\ta
> >t
> >org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\ta
> >t
> >org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n
> >\tat
> >org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
> >org.apache.solr.handler.component.QueryComponent.process(QueryComponent.ja
> >va:459)\n\tat
> >org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
> >ndler.java:218)\n\tat
> >
> >
> >Has anyone tried something like this before, and does anyone have any
> >novel
> >ideas for how to approach it, no matter how different?  How about a
> >workaround for the group.func error here?  I'm very open-minded about
> >where
> >to go on this one.
> >
> >Thanks,
> >Ryan
>
>

Re: An interesting approach to grouping

Posted by "Jim.Musil" <Ji...@target.com>.

I¹m not sure the query you provided will do what you want, BUT I did find
the bug in the code that is causing the NullPointerException.

The variable context is supposed to be global, but when prepare() is
called, it is only defined in the scope of that function.

Here¹s the simple patch:

Index: core/src/java/org/apache/solr/search/Grouping.java
===================================================================
--- core/src/java/org/apache/solr/search/Grouping.java	(revision 1653358)
+++ core/src/java/org/apache/solr/search/Grouping.java	(working copy)
@@ -926,7 +926,7 @@
      */
     @Override
     protected void prepare() throws IOException {
-      Map context = ValueSource.newContext(searcher);
+      context = ValueSource.newContext(searcher);
       groupBy.createWeight(context, searcher);
       actualGroupsToFind = getMax(offset, numGroups, maxDoc);
     }


I¹ll search for a Jira issue and open if I can¹t find one.

Jim Musil



On 1/26/15, 6:34 PM, "Ryan Josal" <ry...@josal.com> wrote:

>I have an index of products, and these products have a "category" which we
>can say for now is a good approximation of its location in the store.  I'm
>investigating altering the ordering of the results so that the categories
>aren't interlaced as much... so that the results are a little bit more
>grouped by category, but not *totally* grouped by category.  It's
>interesting because it's an approach that sort of compares results to
>near-scored/ranked results.  One of the hoped outcomes of this would that
>there would be somewhat fewer categories represented in the top results
>for
>a given query, although it is questionable if this is a good measurement
>to
>determine the effectiveness of the implementation.
>
>My first attempt was to
>group=true&group.main=true&group.field=category&group.func=rint(scale(quer
>y({!type=edismax
>v=$q}),0,20))
>
>Or some FunctionQuery like that, so that in order to become a member of a
>group, the doc would have to have the same category, and be dropped into
>the same score bucket (20 in this case).  This doesn't work out of the
>gate
>due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway):
>
>java.lang.NullPointerException\n\tat
>org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValue
>s(ScaleFloatFunction.java:104)\n\tat
>org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.j
>ava:1111)\n\tat
>org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollec
>tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat
>org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:
>113)\n\tat
>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\ta
>t
>org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\ta
>t
>org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n
>\tat
>org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat
>org.apache.solr.handler.component.QueryComponent.process(QueryComponent.ja
>va:459)\n\tat
>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
>ndler.java:218)\n\tat
>
>
>Has anyone tried something like this before, and does anyone have any
>novel
>ideas for how to approach it, no matter how different?  How about a
>workaround for the group.func error here?  I'm very open-minded about
>where
>to go on this one.
>
>Thanks,
>Ryan