You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by entdeveloper <ca...@gmail.com> on 2011/10/04 02:00:53 UTC

Selective Result Grouping

I'd like to suggest the ability to collapse results in a more similar way to
the old SOLR-236 patch that the current grouping functionality doesn't
provide. I need the ability to collapse only certain results based on the
value of a field, leaving all other results in tact.

As an example, consider the following documents:
ID     TYPE
1       doc
2       image
3       image
4       doc

My desired behavior is to collapse results where TYPE:image, producing a
result set like the following:
1
2 (collapsed, count=2)
4

Currently, when using the Result Grouping feature, I only have the ability
to produce the result set below
1 (grouped, count=2)
2 (grouped, count=2)

I'd like to propose repurposing the 'group.query' parameter to achieve this
behavior. Currently, the group.query parameter behaves exactly like an 'fq'
(at least in terms of the results that are produced). I have yet to come up
with a scenario where the group.query could not be accomplished by using the
other group params and fq.

I'm hoping to collect some thoughts on the subject before submitting a
ticket to jira. Thoughts?

--
View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3391538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Selective Result Grouping

Posted by entdeveloper <ca...@gmail.com>.
Created an issue in jira for this features:
https://issues.apache.org/jira/browse/SOLR-2884


Martijn v Groningen-2 wrote:
> 
> Ok I think I get this. I think this can be achieved if one could
> specify a filter inside a group and only documents that pass the
> filter get grouped. For example only group documents with the value
> image for the mimetype field. This filter should be specified per
> group command. Maybe we should open an issue for this?
> 

--
View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3491886.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Selective Result Grouping

Posted by Martijn v Groningen <ma...@gmail.com>.
Ok I think I get this. I think this can be achieved if one could
specify a filter inside a group and only documents that pass the
filter get grouped. For example only group documents with the value
image for the mimetype field. This filter should be specified per
group command. Maybe we should open an issue for this?

Martijn

On 1 November 2011 19:58, entdeveloper <ca...@gmail.com> wrote:
>
> Martijn v Groningen-2 wrote:
>>
>> When using the group.field option values must be the same otherwise
>> they don't get grouped together. Maybe fuzzy grouping would be nice.
>> Grouping videos and images based on mimetype should be easy, right?
>> Videos have a mimetype that start with video/ and images have a
>> mimetype that start with image/. Storing the mime type's subtype and
>> type in separate fields and group on the type field would do the job.
>> Off course you need to know the mimetype during indexing, but
>> solutions like Apache Tika can do that for you.
>
> Not necessarily interested in grouping by mimetype (that's an analysis
> issue). I simply used videos and images as an example.
>
> I'm not sure what you mean by fuzzy grouping. But my goal is to have
> collapse be more selective somehow on what gets grouped. As a more specific
> example, I have a field called 'type', with the following possible field
> values:
>
> Type
> ------
> image
> video
> webpage
>
>
> Basically I want to be able to collapse all the images into a single result
> so that they don't fill up the first page of the results. This is not
> possible with the current grouping implementation because if you call
> group.field=type, it'll group everything. I do not want to collapse videos
> or webpages, only images.
>
> I've attached a screenshot of google's srp to help explain what I mean.
>
> http://lucene.472066.n3.nabble.com/file/n3471548/Screen_Shot_2011-11-01_at_11.52.04_AM.png
>
> Hopefully that makes more sense. If it's still not clear I can email you
> privately.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3471548.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Selective Result Grouping

Posted by entdeveloper <ca...@gmail.com>.
Martijn v Groningen-2 wrote:
> 
> When using the group.field option values must be the same otherwise
> they don't get grouped together. Maybe fuzzy grouping would be nice.
> Grouping videos and images based on mimetype should be easy, right?
> Videos have a mimetype that start with video/ and images have a
> mimetype that start with image/. Storing the mime type's subtype and
> type in separate fields and group on the type field would do the job.
> Off course you need to know the mimetype during indexing, but
> solutions like Apache Tika can do that for you.

Not necessarily interested in grouping by mimetype (that's an analysis
issue). I simply used videos and images as an example.

I'm not sure what you mean by fuzzy grouping. But my goal is to have
collapse be more selective somehow on what gets grouped. As a more specific
example, I have a field called 'type', with the following possible field
values:

Type
------
image
video
webpage


Basically I want to be able to collapse all the images into a single result
so that they don't fill up the first page of the results. This is not
possible with the current grouping implementation because if you call
group.field=type, it'll group everything. I do not want to collapse videos
or webpages, only images.

I've attached a screenshot of google's srp to help explain what I mean.

http://lucene.472066.n3.nabble.com/file/n3471548/Screen_Shot_2011-11-01_at_11.52.04_AM.png 

Hopefully that makes more sense. If it's still not clear I can email you
privately.

--
View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3471548.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Selective Result Grouping

Posted by Martijn v Groningen <ma...@gmail.com>.
> The current grouping functionality using group.field is basically
> all-or-nothing: all documents will be grouped by the field value or none
> will. So there would be no way to, for example, collapse just the videos or
> images like they do in google.
When using the group.field option values must be the same otherwise
they don't get grouped together. Maybe fuzzy grouping would be nice.
Grouping videos and images based on mimetype should be easy, right?
Videos have a mimetype that start with video/ and images have a
mimetype that start with image/. Storing the mime type's subtype and
type in separate fields and group on the type field would do the job.
Off course you need to know the mimetype during indexing, but
solutions like Apache Tika can do that for you.

-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Selective Result Grouping

Posted by entdeveloper <ca...@gmail.com>.
Not necessarily collapse.type=adjacent. That is only when two docs with the
same field value appear next to each other. I'm more concerned with the case
where we only want a group of a certain type (no matter where the subsequent
docs may be), leaving the rest of the documents ungrouped.

The current grouping functionality using group.field is basically
all-or-nothing: all documents will be grouped by the field value or none
will. So there would be no way to, for example, collapse just the videos or
images like they do in google.

You're correct it would be difficult to support this in a sharded
environment, but like most other features, it could be available in a single
shard first and work toward supporting it in a sharded env.

--
View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3429618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Selective Result Grouping

Posted by Martijn v Groningen <ma...@gmail.com>.
So if look at the old SOLR-236 fieldcollapsing
(http://wiki.apache.org/solr/FieldCollapsingUncommitted) you mean
collapse.type=adjacent ?

I think we shouldn't change group.query parameter. Since it serves a
different purpose. I think it is better to have a new parameter for
this different way of grouping:
group.adjacent=[fieldname|function]

I also think it is difficult to support this feature in sharded
environment. Since the merging the groups is based on the location of
documents inside the result list.

Martijn

On 4 October 2011 02:00, entdeveloper <ca...@gmail.com> wrote:
> I'd like to suggest the ability to collapse results in a more similar way to
> the old SOLR-236 patch that the current grouping functionality doesn't
> provide. I need the ability to collapse only certain results based on the
> value of a field, leaving all other results in tact.
>
> As an example, consider the following documents:
> ID     TYPE
> 1       doc
> 2       image
> 3       image
> 4       doc
>
> My desired behavior is to collapse results where TYPE:image, producing a
> result set like the following:
> 1
> 2 (collapsed, count=2)
> 4
>
> Currently, when using the Result Grouping feature, I only have the ability
> to produce the result set below
> 1 (grouped, count=2)
> 2 (grouped, count=2)
>
> I'd like to propose repurposing the 'group.query' parameter to achieve this
> behavior. Currently, the group.query parameter behaves exactly like an 'fq'
> (at least in terms of the results that are produced). I have yet to come up
> with a scenario where the group.query could not be accomplished by using the
> other group params and fq.
>
> I'm hoping to collect some thoughts on the subject before submitting a
> ticket to jira. Thoughts?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3391538.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Met vriendelijke groet,

Martijn van Groningen