You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Emmanuel Keller (JIRA)" <ji...@apache.org> on 2007/05/12 00:14:16 UTC

[jira] Created: (SOLR-236) Field collapsing

Field collapsing
----------------

                 Key: SOLR-236
                 URL: https://issues.apache.org/jira/browse/SOLR-236
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.2
            Reporter: Emmanuel Keller


This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 3 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.max" to select how many continuous results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (SOLR-236) Field collapsing

Posted by Jay Hill <ja...@gmail.com>.
Awesome, that did the trick. Thanks Martijn!

-Jay


On Sat, Jul 25, 2009 at 5:58 AM, Martijn van Groningen (JIRA) <
jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Martijn van Groningen updated SOLR-236:
> ---------------------------------------
>
>     Attachment: field-collapse-3.patch
>
> Hey Jay, I have fixed this issue in the new patch. So if you apply the new
> patch everything should be fine.
> The compile error was a result of of the upgrade of the Lucene libraries is
> Solr. Because of LUCENE-1630 a new method was added to the Collector class.
> In this patch I also removed the invocations to ExtendedFieldCache methods
> and changed them to FieldCache methods. ExtendedFieldCache is now deprecated
> in the updated Lucene libraries. If you have any problems with this patch
> let me know.
>
> Important:
> Only use this patch from revision 794328 (07/15/2009) and up. Use the
> previous patch if you are using an older 1.4-dev revision.
>
> > Field collapsing
> > ----------------
> >
> >                 Key: SOLR-236
> >                 URL: https://issues.apache.org/jira/browse/SOLR-236
> >             Project: Solr
> >          Issue Type: New Feature
> >          Components: search
> >    Affects Versions: 1.3
> >            Reporter: Emmanuel Keller
> >             Fix For: 1.5
> >
> >         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
> collapsing-patch-to-1.3.0-ivan.patch,
> collapsing-patch-to-1.3.0-ivan_2.patch,
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch,
> SOLR-236_collapsing.patch
> >
> >
> > This patch include a new feature called "Field collapsing".
> > "Used in order to collapse a group of results with similar value for a
> given field to a single entry in the result set. Site collapsing is a
> special case of this, where all results for a given web site is collapsed
> into one or two entries in the result set, typically with an associated
> "more documents from this site" link. See also Duplicate detection."
> > http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> > The implementation add 3 new query parameters (SolrParams):
> > "collapse.field" to choose the field used to group results
> > "collapse.type" normal (default value) or adjacent
> > "collapse.max" to select how many continuous results are allowed before
> collapsing
> > TODO (in progress):
> > - More documentation (on source code)
> > - Test cases
> > Two patches:
> > - "field_collapsing.patch" for current development version
> > - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> > P.S.: Feedback and misspelling correction are welcome ;-)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Trey <so...@gmail.com>.
I also think the isTokenized() check/exception should be removed.  It is
probably a common use-case to have a single-valued "tokenized" field - i.e.
a case insensitive string (a text field where the only filter applied is a
LowerCaseFilterFactory).  I think that as long as it's documented that field
collapsing "doesn't work" for fields with multiple tokens then it shouldn't
be an issue.  That certainly seems better to me than preventing a perfectly
valid use case, since you wouldn't get any results anyway.


 if (schemaField.getType().
isTokenized()) {
   throw new RuntimeException("Could not collapse, because collapse field is
tokenized");
 }

I agree that it would be better to "check" if the field has multiple values
or not.  In the mean-time, though, perhaps the "remove the check and log a
warning" approach would suffice?


-Trey


On Tue, Jan 19, 2010 at 5:46 AM, Martijn van Groningen (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802186#action_12802186]
>
> Martijn van Groningen commented on SOLR-236:
> --------------------------------------------
>
> If the field is tokenized and has more than one token your field collapse
> result will become incorrect. What happens if I remember correctly is that
> it will only collapse on the field's last token. This off course leads to
> weird collapse groups. For the users that only have one token per collapse
> field are because of this check out of luck. Somehow I think we should make
> the user know that is not possible to collapse on a tokenized field (at
> least with multiple tokens). Maybe adding a warning in the response. Still I
> think the exception is more clear, but also prohibits it off course.
>
> bq. Or someone could come after me and write a patch that checks for
> multi-tokened fields somehow and throws an exception.
> Checking if a tokenized field contains only one token is really
> inefficient, because you have the check all every collapse field of all
> documents. Now do check is done based on the field's definition in the
> schema.
>
> > Field collapsing
> > ----------------
> >
> >                 Key: SOLR-236
> >                 URL: https://issues.apache.org/jira/browse/SOLR-236
> >             Project: Solr
> >          Issue Type: New Feature
> >          Components: search
> >    Affects Versions: 1.3
> >            Reporter: Emmanuel Keller
> >            Assignee: Shalin Shekhar Mangar
> >             Fix For: 1.5
> >
> >         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
> collapsing-patch-to-1.3.0-ivan.patch,
> collapsing-patch-to-1.3.0-ivan_2.patch,
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
> SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch,
> SOLR-236_collapsing.patch
> >
> >
> > This patch include a new feature called "Field collapsing".
> > "Used in order to collapse a group of results with similar value for a
> given field to a single entry in the result set. Site collapsing is a
> special case of this, where all results for a given web site is collapsed
> into one or two entries in the result set, typically with an associated
> "more documents from this site" link. See also Duplicate detection."
> > http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> > The implementation add 3 new query parameters (SolrParams):
> > "collapse.field" to choose the field used to group results
> > "collapse.type" normal (default value) or adjacent
> > "collapse.max" to select how many continuous results are allowed before
> collapsing
> > TODO (in progress):
> > - More documentation (on source code)
> > - Test cases
> > Two patches:
> > - "field_collapsing.patch" for current development version
> > - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> > P.S.: Feedback and misspelling correction are welcome ;-)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Mike Klaas <mi...@gmail.com>.
On 12-Jun-07, at 2:36 PM, Yonik Seeley wrote:

> On 6/12/07, Mike Klaas <mi...@gmail.com> wrote:
>> The way I do field collapsing is simply gathering documents and
>> collapsing them until I've gathered X groups for user display (which
>> usually involves looking at a few tens of documents more, rather than
>> the entire 3,000,000+ result set).
>
> Isn't this then dependent on the order of the documents in the index?
> Or it sounds like you don't "promote" lower scoring documents into a
> higher scoring group unless they both happen to be in the top docs
> requested?

Precisely.  I don't care how many docs are in a group, just avoiding  
displaying two documents in the same group.  That way you can process  
the docs in score order for essentially zero cost.

-Mike

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Yonik Seeley <yo...@apache.org>.
On 6/12/07, Mike Klaas <mi...@gmail.com> wrote:
> The way I do field collapsing is simply gathering documents and
> collapsing them until I've gathered X groups for user display (which
> usually involves looking at a few tens of documents more, rather than
> the entire 3,000,000+ result set).

Isn't this then dependent on the order of the documents in the index?
Or it sounds like you don't "promote" lower scoring documents into a
higher scoring group unless they both happen to be in the top docs
requested?

-Yonik

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Mike Klaas <mi...@gmail.com>.
On 11-Jun-07, at 5:48 PM, Chris Hostetter wrote:

>
> : Yes, the current JIRA patch uses the FieldCache.
>
> I just ment in contrast with Mike's comment about iterating over  
> all the
> stored fields to support the "post-faceting" situation (but frankly  
> i'm
> not sure that i undersatnd what the "post-faceting" situation is,  
> so feel
> free to ignore me)

I'm not sure either--I assume that it means facet on a DocSet that is  
limited to the the representative doc in each collapsed group.  Or is  
it faceting within each group?

If so, then all documents in the result set needs to be collapsed to  
determine this list of docs (which perhaps is not too inefficient?).   
The way I do field collapsing is simply gathering documents and  
collapsing them until I've gathered X groups for user display (which  
usually involves looking at a few tens of documents more, rather than  
the entire 3,000,000+ result set).

I'm going to bow out now, as I don't think I understand what exactly  
we're talking about <g>

-Mike

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Chris Hostetter <ho...@fucit.org>.
: Yes, the current JIRA patch uses the FieldCache.

I just ment in contrast with Mike's comment about iterating over all the
stored fields to support the "post-faceting" situation (but frankly i'm
not sure that i undersatnd what the "post-faceting" situation is, so feel
free to ignore me)

: >... wouldn't iterating over each doc, and using the
: > FieldCache+TermDocs make it very efficient to find all the docs that have
: > the same indexed value as the current one?
:
: The most efficient way will heavily depend on the nature of the
: collapse field (few terms or many).  I can't currently think of a way
: to do it efficiently for both.

this sounds a lot like the faceting problem (to term enum or notto term
enum) and the discussion about building a "facet field cache" at server
startup if we know faceting is important on certain fields ... by default
we can do our best, but with added configuration hints telling us what you
expect, we can make more informed guesses.


-Hoss


Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Yonik Seeley <yo...@apache.org>.
On 6/11/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : It seems that the only way to do it would be to collapse the entire
> : result set first, which entails loading the stored fields of the
> : whole docset.
> :
> : That doesn't seem particularly feasible to do exactly.
>
> I haven't really been following this conversation that closely, but
> assuming what you guys are talking about is desirable, it seems like one
> way to accomplish it might be to make it operate on the *indexed* values
> for a field

Yes, the current JIRA patch uses the FieldCache.

>... wouldn't iterating over each doc, and using the
> FieldCache+TermDocs make it very efficient to find all the docs that have
> the same indexed value as the current one?

The most efficient way will heavily depend on the nature of the
collapse field (few terms or many).  I can't currently think of a way
to do it efficiently for both.

-Yonik

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Chris Hostetter <ho...@fucit.org>.
: It seems that the only way to do it would be to collapse the entire
: result set first, which entails loading the stored fields of the
: whole docset.
:
: That doesn't seem particularly feasible to do exactly.

I haven't really been following this conversation that closely, but
assuming what you guys are talking about is desirable, it seems like one
way to accomplish it might be to make it operate on the *indexed* values
for a field ... wouldn't iterating over each doc, and using the
FieldCache+TermDocs make it very efficient to find all the docs that have
the same indexed value as the current one?


-Hoss


Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Mike Klaas <mi...@gmail.com>.
On 11-Jun-07, at 8:10 AM, Will Johnson wrote:

> And one other point, one of the reasons why it's hard to find an  
> example
> of post-faceting is that many of the major engines can't do it.

It seems that the only way to do it would be to collapse the entire  
result set first, which entails loading the stored fields of the  
whole docset.

That doesn't seem particularly feasible to do exactly.

-Mike

RE: [jira] Commented: (SOLR-236) Field collapsing

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
And one other point, one of the reasons why it's hard to find an example
of post-faceting is that many of the major engines can't do it. 

- will

-----Original Message-----
From: Will Johnson [mailto:wjohnson@getconnected.com] 
Sent: Monday, June 11, 2007 11:05 AM
To: solr-dev@lucene.apache.org
Subject: RE: [jira] Commented: (SOLR-236) Field collapsing

>I assumed they would... I think our signals might be crossed w.r.t.
>the meaning of pre or post collapsing.  Faceting "post collapsing" I
>took to mean that the base docset would be restricted to the top "n"
>of each category.

In my view, faceting should occur on the full collapsed result set.  Ie
break down 100 hits to 50 unique ones, then compute facets on those 50
even though you may only return 10 to the user.

>circuitcity does it how I would expect... field collapsing does not
>effect the facets on the left.
>For example, if I search for memory, a facet tells me that there are
>70 under "Digital Cameras".  If I look down the collapsed results,
>"Digital Cameras" only shows the top match, but has a link to "View
>all 70 matches".

I agree, circuit city is a use case where you want pre-faceting.  If you
think about site collapsing though I may se that there are 57 documents
in my result set of type x, then clicking on type x should show me 57
docs.

>15 documents displayed to the user, or 15 total documents that matched
>the query?
>If the latter, I don't see how you could get greater than 15 for any
>facet count.

If I see that there are 15 of type x and click on it then 'total result
found' on the next page should say 15, not any higher.


-Yonik

RE: [jira] Commented: (SOLR-236) Field collapsing

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
>I assumed they would... I think our signals might be crossed w.r.t.
>the meaning of pre or post collapsing.  Faceting "post collapsing" I
>took to mean that the base docset would be restricted to the top "n"
>of each category.

In my view, faceting should occur on the full collapsed result set.  Ie
break down 100 hits to 50 unique ones, then compute facets on those 50
even though you may only return 10 to the user.

>circuitcity does it how I would expect... field collapsing does not
>effect the facets on the left.
>For example, if I search for memory, a facet tells me that there are
>70 under "Digital Cameras".  If I look down the collapsed results,
>"Digital Cameras" only shows the top match, but has a link to "View
>all 70 matches".

I agree, circuit city is a use case where you want pre-faceting.  If you
think about site collapsing though I may se that there are 57 documents
in my result set of type x, then clicking on type x should show me 57
docs.

>15 documents displayed to the user, or 15 total documents that matched
>the query?
>If the latter, I don't see how you could get greater than 15 for any
>facet count.

If I see that there are 15 of type x and click on it then 'total result
found' on the next page should say 15, not any higher.


-Yonik

Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Yonik Seeley <yo...@apache.org>.
On 6/11/07, Will Johnson <wj...@getconnected.com> wrote:
> Having worked on a number of customer implementations regarding this
> feature I can say that the number one requirement is for the facet
> counts to be accurate post collapsing.  It all comes down to the user
> experience.  For example, if I run a query that get collapsed and has a
> facet count for the non-collapsed value then when I click on that facet
> for refinement the number of hits in my subsequent query will not match
> the number of hits displayed by that facet count.

I assumed they would... I think our signals might be crossed w.r.t.
the meaning of pre or post collapsing.  Faceting "post collapsing" I
took to mean that the base docset would be restricted to the top "n"
of each category.

circuitcity does it how I would expect... field collapsing does not
effect the facets on the left.
For example, if I search for memory, a facet tells me that there are
70 under "Digital Cameras".  If I look down the collapsed results,
"Digital Cameras" only shows the top match, but has a link to "View
all 70 matches".

I don't know what bestbuy is doing, but when I search for memory, I
get a brand facet with "Sony (244)"... if I click that, it finds 95
items I can page through (but some facets still display counts higher
than 95).

>  Ie if it says there
> are 10 docs in my result set of type x then when I click on type x I
> expect to get back 10 hits.

Agree.

> Further, I could easily end up with a
> result set with 15 total hits but a facet count hat says there are 200
> results of type x which is very disconcerting from a user perspective.

15 documents displayed to the user, or 15 total documents that matched
the query?
If the latter, I don't see how you could get greater than 15 for any
facet count.

-Yonik

RE: [jira] Commented: (SOLR-236) Field collapsing

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
Having worked on a number of customer implementations regarding this
feature I can say that the number one requirement is for the facet
counts to be accurate post collapsing.  It all comes down to the user
experience.  For example, if I run a query that get collapsed and has a
facet count for the non-collapsed value then when I click on that facet
for refinement the number of hits in my subsequent query will not match
the number of hits displayed by that facet count.  Ie if it says there
are 10 docs in my result set of type x then when I click on type x I
expect to get back 10 hits.  Further, I could easily end up with a
result set with 15 total hits but a facet count hat says there are 200
results of type x which is very disconcerting from a user perspective. 

I agree that there are times when pre-faceting is also good, but
post-faceting has always been a rather hard requirement for most
ecommerce/data discovery sites.

- will

-----Original Message-----
From: Emmanuel Keller (JIRA) [mailto:jira@apache.org] 
Sent: Sunday, June 10, 2007 7:33 AM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-236) Field collapsing


    [
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12503162 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

Do we have to make a choice ? Both behaviors are interesting. 
What about a new parameter like collapse.facet=[pre|post] ?



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a
given field to a single entry in the result set. Site collapsing is a
special case of this, where all results for a given web site is
collapsed into one or two entries in the result set, typically with an
associated "more documents from this site" link. See also Duplicate
detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed
before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (SOLR-236) Field collapsing

Posted by Martijn v Groningen <ma...@gmail.com>.
Yes, I used his patch.

2009/12/23 Noble Paul (JIRA) <ji...@apache.org>:
>
>    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794050#action_12794050 ]
>
> Noble Paul commented on SOLR-236:
> ---------------------------------
>
> is't the patch built on the one given by shalin? the configuration looks different...
>
>> Field collapsing
>> ----------------
>>
>>                 Key: SOLR-236
>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.3
>>            Reporter: Emmanuel Keller
>>            Assignee: Shalin Shekhar Mangar
>>             Fix For: 1.5
>>
>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>
>>
>> This patch include a new feature called "Field collapsing".
>> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>> The implementation add 3 new query parameters (SolrParams):
>> "collapse.field" to choose the field used to group results
>> "collapse.type" normal (default value) or adjacent
>> "collapse.max" to select how many continuous results are allowed before collapsing
>> TODO (in progress):
>> - More documentation (on source code)
>> - Test cases
>> Two patches:
>> - "field_collapsing.patch" for current development version
>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>> P.S.: Feedback and misspelling correction are welcome ;-)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

RE: [jira] Commented: (SOLR-236) Field collapsing

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
I haven't looked at any of the patches but I can comment some other uses
for the feature that are in production today with major vendors.  While
it's used for site collapsing ala google it's also heavily used in
ecommerce settings.  Check out BestBuy.com/circuitcity/etc and do a
search for some really generic word like 'cable' and notice all the
groups of items; BB shows 3 per group, CC shows 1 per group.  In each
case it's not clear that the number of docs is really limited at all, ie
it's more important to get back all the categories with n docs per
category and the counts per category than it is to get back a fixed
number of results or even categories for that matter.  Also notice that
neither of these sites allow you to page through the categorized
results.

I'd also point out that many vendors require the collapsing field to be
an int instead of a string and then force the front end to do the
mapping.  just one more thing to consider....

- will

 

-----Original Message-----
From: Yonik Seeley (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, June 05, 2007 9:01 AM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-236) Field collapsing


    [
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12501550 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

I guess adjacent collapsing can make sense when one is sorting by the
field that is being collapsed.

For the normal collapsing though, this patch appears to implement it by
changing the sort order to the collapsing field (normally not desired).
For example, if sorting by relevance and collapsing on a field, one
would normally want the groups sorted by relevance (with the group
relevance defined as the max score of it's members).

As far as how to do paging, it makes sense to rigidly define it in terms
of number of documents, regardless of how many documents are in each
group.  Going back to google, it always displays the first 10 documents,
but a variable number of groups.   That does mean that a group could be
split across pages.  It would actually be much simpler (IMO) to always
return a fixed number of groups rather than a fixed number of documents,
but I don't think this would be less useful to people.  Thoughts?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a
given field to a single entry in the result set. Site collapsing is a
special case of this, where all results for a given web site is
collapsed into one or two entries in the result set, typically with an
associated "more documents from this site" link. See also Duplicate
detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed
before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Martijn v Groningen <ma...@gmail.com>.
Yes, I can reproduce the same situation here. I will update the patch
asap and add it to Jira.

Martijn

2009/12/7 Marc Sturlese <ma...@gmail.com>:
>
> Hey! Got it working!
> The problem was that my uniqueField is indexed as long and it's not suported
> by the patch.
> The value is obtained in getCollapseGroupResult function in
> AbstarctCollapseCollector.java as:
>
> String schemaId = searcher.doc(docId).get(uniqueIdFieldname);
>
> To suport long,int,slong,sint,float,sfloat...
> It should be obtaining doing somenthing like:
>
> FieldType idFieldType =
> searcher.getSchema().getFieldType(uniqueIdFieldname);
> String schemaId = "";
> Fieldable name_field = null;
> try {
>      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
> } catch (IOException ex) {
>      //deal with exception
> }
> if (name_field != null) {
>   schemaId = idFieldType.storedToReadable(name_field);
> }
>
>
> Martijn v Groningen wrote:
>>
>> The last two parameters are not necessary, since they default both to
>> true. Could you run the field collapse tests tests successful?
>>
>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>
>>> The request I am sending is:
>>> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>>>
>>> I search for 'aaa' in the content field. All the documents in the result
>>> contain that string in the field content
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Yes it should look similar to that. What is the exact request you send
>>>> to
>>>> Solr?
>>>> Also to check if the patch works correctly can you run: ant clean test
>>>> There are a number of tests that test the Field collapse functionality.
>>>>
>>>> Martijn
>>>>
>>>>
>>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>>
>>>>>><lst name="collapse_counts">
>>>>>>   <str name="field">cat</str>
>>>>>>    <lst name="results">
>>>>>>        <lst name="009">
>>>>>>            <str name="fieldValue">hard</str>
>>>>>>           <int name="collapseCount">1</int>
>>>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>>>                 <doc>
>>>>>>                    <long name="id">008</long>
>>>>>>                    <str name="content">aaa aaa</str>
>>>>>>                    <str name="col">ccc</str>
>>>>>>                 </doc>
>>>>>>            </result>
>>>>>>        </lst>
>>>>>>        ...
>>>>>>    </lst>
>>>>>></lst>
>>>>> I see, looks like I am applying the patch wrongly somehow.
>>>>> This the complete collapse_counts response I am getting:
>>>>> <lst name="collapse_counts">
>>>>>  <str name="field">col</str>
>>>>>  <lst name="results">
>>>>>    <lst>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">bbb</str>
>>>>>      <str name="fieldValue">ccc</str>
>>>>>      <str name="fieldValue">xxx</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">2</long>
>>>>>          <str name="content">aaa aaa</str>
>>>>>          <str name="col">bbb</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">8</long>
>>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>>          <str name="col">ccc</str>
>>>>>       </doc>
>>>>>      </result>
>>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>>        <doc>
>>>>>          <long name="id">12</long>
>>>>>          <str name="content">aaa aaa aaa v</str>
>>>>>          <str name="col">xxx</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>  </lst>
>>>>> </lst>
>>>>>
>>>>> As you can see I am getting a <lst> tag with no name. As I understood
>>>>> what
>>>>> you told me. I should be getting as many lst tags as collapsed groups
>>>>> and
>>>>> the name attribute of the lst should be the unique field value. So, if
>>>>> the
>>>>> patch was applyed correcly teh response should look like:
>>>>>
>>>>> <lst name="collapse_counts">
>>>>>  <str name="field">col</str>
>>>>>  <lst name="results">
>>>>>    <lst name="354> (the head value of the collapsed group)
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">bbb</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">2</long>
>>>>>          <str name="content">aaa aaa</str>
>>>>>          <str name="col">bbb</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>    <lst name="654">
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">ccc</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">8</long>
>>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>>          <str name="col">ccc</str>
>>>>>       </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>    <lst name="654">
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">xxx</str>
>>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>>        <doc>
>>>>>          <long name="id">12</long>
>>>>>          <str name="content">aaa aaa aaa v</str>
>>>>>          <str name="col">xxx</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>  </lst>
>>>>> </lst>
>>>>>
>>>>> Is this the way the response looks like when you use teh patch?
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>> Martijn v Groningen wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> I'm not sure if I follow you completely, but the example you gave is
>>>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>>>> following response that the latest patches produce.
>>>>>>
>>>>>> <lst name="collapse_counts">
>>>>>>     <str name="field">cat</str>
>>>>>>     <lst name="results">
>>>>>>         <lst name="009">
>>>>>>             <str name="fieldValue">hard</str>
>>>>>>             <int name="collapseCount">1</int>
>>>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>>>                  <doc>
>>>>>>                     <long name="id">008</long>
>>>>>>                     <str name="content">aaa aaa</str>
>>>>>>                     <str name="col">ccc</str>
>>>>>>                  </doc>
>>>>>>             </result>
>>>>>>         </lst>
>>>>>>         ...
>>>>>>     </lst>
>>>>>> </lst>
>>>>>>
>>>>>> The result list contains collapse groups. The name of the child
>>>>>> elements are the collapse head ids. Everything that falls under the
>>>>>> collapse head belongs to that collapse group and thus adding document
>>>>>> head id to the field value is unnecessary.  In the above example
>>>>>> document with id 009 is the document head of document with id 008.
>>>>>> Document with id 009 should be displayed in the search result.
>>>>>>
>>>>>> From what you have said, it seems that you properly configured the
>>>>>> patch.
>>>>>>
>>>>>> Martijn
>>>>>>
>>>>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>>>>
>>>>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>>>>> missing
>>>>>>> something or the way to show the collapsed documents when adjacent
>>>>>>> collapse
>>>>>>> can be sometimes confusing:
>>>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>>>> (not
>>>>>>> using both at same time):
>>>>>>>  <searchComponent name="query"
>>>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>>>> What I have noticed is, imagin you get these results in the search:
>>>>>>> doc1:
>>>>>>>   id:001
>>>>>>>   collapseField:ccc
>>>>>>> doc2:
>>>>>>>   id:002
>>>>>>>   collapseField:aaa
>>>>>>> doc3:
>>>>>>>   id:003
>>>>>>>   collapseField:ccc
>>>>>>> doc4:
>>>>>>>   id:004
>>>>>>>   collapseField:bbb
>>>>>>>
>>>>>>> And in the collapse_counts you get:
>>>>>>> <int name="collapseCount">1</int>
>>>>>>> <str name="fieldValue">ccc</str>
>>>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>>>> <doc>
>>>>>>> <long name="id">008</long>
>>>>>>> <str name="content">aaa aaa</str>
>>>>>>> <str name="col">ccc</str>
>>>>>>> </doc>
>>>>>>> </result>
>>>>>>>
>>>>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>>>>> could
>>>>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>>>>> the
>>>>>>> collapsed documents?
>>>>>>>
>>>>>>> Adding something to collapse_counts like:
>>>>>>> <int name="collapseCount">1</int>
>>>>>>> <str name="fieldValue">ccc</str>
>>>>>>> <str name="uniqueFieldId">003</str>
>>>>>>>
>>>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>>>>> return:
>>>>>>> <str name="fieldValue">ccc#003</str>
>>>>>>> but this respose looks dirty...
>>>>>>>
>>>>>>> As I said maybe I am missunderstanding something and this can be
>>>>>>> knwon
>>>>>>> in
>>>>>>> someway. In that case can someone tell me how?
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> JIRA jira@apache.org wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>     [
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>>>> ]
>>>>>>>>
>>>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56
>>>>>>>> PM:
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> I have attached a new patch that has the following changes:
>>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>>> [solr
>>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>>> configure
>>>>>>>> field-collapsing with caching.
>>>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
>>>>>>>> used
>>>>>>>> instead). It was deprecated for a long time.
>>>>>>>>
>>>>>>>>       was (Author: martijn):
>>>>>>>>     I have attached a new patch that has the following changes:
>>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>>> [solr
>>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>>> configure
>>>>>>>> the
>>>>>>>> field-collapsing with caching.
>>>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
>>>>>>>> used
>>>>>>>> instead). It was deprecated for a long time.
>>>>>>>>
>>>>>>>>> Field collapsing
>>>>>>>>> ----------------
>>>>>>>>>
>>>>>>>>>                 Key: SOLR-236
>>>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>>>             Project: Solr
>>>>>>>>>          Issue Type: New Feature
>>>>>>>>>          Components: search
>>>>>>>>>    Affects Versions: 1.3
>>>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>>>             Fix For: 1.5
>>>>>>>>>
>>>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>>>> field-collapsing-extended-592129.patch,
>>>>>>>>> field_collapsing_1.1.0.patch,
>>>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>>> solr-236.patch, SOLR-236_collapsing.patch,
>>>>>>>>> SOLR-236_collapsing.patch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>>>> "Used in order to collapse a group of results with similar value
>>>>>>>>> for
>>>>>>>>> a
>>>>>>>>> given field to a single entry in the result set. Site collapsing is
>>>>>>>>> a
>>>>>>>>> special case of this, where all results for a given web site is
>>>>>>>>> collapsed
>>>>>>>>> into one or two entries in the result set, typically with an
>>>>>>>>> associated
>>>>>>>>> "more documents from this site" link. See also Duplicate
>>>>>>>>> detection."
>>>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>>>>> before
>>>>>>>>> collapsing
>>>>>>>>> TODO (in progress):
>>>>>>>>> - More documentation (on source code)
>>>>>>>>> - Test cases
>>>>>>>>> Two patches:
>>>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> This message is automatically generated by JIRA.
>>>>>>>> -
>>>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Met vriendelijke groet,
>>>>>>
>>>>>> Martijn van Groningen
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Met vriendelijke groet,
>>>>
>>>> Martijn van Groningen
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679520.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Marc Sturlese <ma...@gmail.com>.
Hey! Got it working!
The problem was that my uniqueField is indexed as long and it's not suported
by the patch.
The value is obtained in getCollapseGroupResult function in
AbstarctCollapseCollector.java as:

String schemaId = searcher.doc(docId).get(uniqueIdFieldname);

To suport long,int,slong,sint,float,sfloat...
It should be obtaining doing somenthing like:

FieldType idFieldType =
searcher.getSchema().getFieldType(uniqueIdFieldname);
String schemaId = "";
Fieldable name_field = null;
try {
      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
} catch (IOException ex) {
      //deal with exception                
}
if (name_field != null) {
   schemaId = idFieldType.storedToReadable(name_field);
}


Martijn v Groningen wrote:
> 
> The last two parameters are not necessary, since they default both to
> true. Could you run the field collapse tests tests successful?
> 
> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>
>> The request I am sending is:
>> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>>
>> I search for 'aaa' in the content field. All the documents in the result
>> contain that string in the field content
>>
>> Martijn v Groningen wrote:
>>>
>>> Yes it should look similar to that. What is the exact request you send
>>> to
>>> Solr?
>>> Also to check if the patch works correctly can you run: ant clean test
>>> There are a number of tests that test the Field collapse functionality.
>>>
>>> Martijn
>>>
>>>
>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>
>>>>><lst name="collapse_counts">
>>>>>   <str name="field">cat</str>
>>>>>    <lst name="results">
>>>>>        <lst name="009">
>>>>>            <str name="fieldValue">hard</str>
>>>>>           <int name="collapseCount">1</int>
>>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>>                 <doc>
>>>>>                    <long name="id">008</long>
>>>>>                    <str name="content">aaa aaa</str>
>>>>>                    <str name="col">ccc</str>
>>>>>                 </doc>
>>>>>            </result>
>>>>>        </lst>
>>>>>        ...
>>>>>    </lst>
>>>>></lst>
>>>> I see, looks like I am applying the patch wrongly somehow.
>>>> This the complete collapse_counts response I am getting:
>>>> <lst name="collapse_counts">
>>>>  <str name="field">col</str>
>>>>  <lst name="results">
>>>>    <lst>
>>>>      <int name="collapseCount">1</int>
>>>>      <int name="collapseCount">1</int>
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">bbb</str>
>>>>      <str name="fieldValue">ccc</str>
>>>>      <str name="fieldValue">xxx</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">2</long>
>>>>          <str name="content">aaa aaa</str>
>>>>          <str name="col">bbb</str>
>>>>        </doc>
>>>>      </result>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">8</long>
>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>          <str name="col">ccc</str>
>>>>       </doc>
>>>>      </result>
>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>        <doc>
>>>>          <long name="id">12</long>
>>>>          <str name="content">aaa aaa aaa v</str>
>>>>          <str name="col">xxx</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>  </lst>
>>>> </lst>
>>>>
>>>> As you can see I am getting a <lst> tag with no name. As I understood
>>>> what
>>>> you told me. I should be getting as many lst tags as collapsed groups
>>>> and
>>>> the name attribute of the lst should be the unique field value. So, if
>>>> the
>>>> patch was applyed correcly teh response should look like:
>>>>
>>>> <lst name="collapse_counts">
>>>>  <str name="field">col</str>
>>>>  <lst name="results">
>>>>    <lst name="354> (the head value of the collapsed group)
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">bbb</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">2</long>
>>>>          <str name="content">aaa aaa</str>
>>>>          <str name="col">bbb</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>    <lst name="654">
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">ccc</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">8</long>
>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>          <str name="col">ccc</str>
>>>>       </doc>
>>>>      </result>
>>>>    </lst>
>>>>    <lst name="654">
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">xxx</str>
>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>        <doc>
>>>>          <long name="id">12</long>
>>>>          <str name="content">aaa aaa aaa v</str>
>>>>          <str name="col">xxx</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>  </lst>
>>>> </lst>
>>>>
>>>> Is this the way the response looks like when you use teh patch?
>>>> Thanks in advance
>>>>
>>>>
>>>> Martijn v Groningen wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> I'm not sure if I follow you completely, but the example you gave is
>>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>>> following response that the latest patches produce.
>>>>>
>>>>> <lst name="collapse_counts">
>>>>>     <str name="field">cat</str>
>>>>>     <lst name="results">
>>>>>         <lst name="009">
>>>>>             <str name="fieldValue">hard</str>
>>>>>             <int name="collapseCount">1</int>
>>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>>                  <doc>
>>>>>                     <long name="id">008</long>
>>>>>                     <str name="content">aaa aaa</str>
>>>>>                     <str name="col">ccc</str>
>>>>>                  </doc>
>>>>>             </result>
>>>>>         </lst>
>>>>>         ...
>>>>>     </lst>
>>>>> </lst>
>>>>>
>>>>> The result list contains collapse groups. The name of the child
>>>>> elements are the collapse head ids. Everything that falls under the
>>>>> collapse head belongs to that collapse group and thus adding document
>>>>> head id to the field value is unnecessary.  In the above example
>>>>> document with id 009 is the document head of document with id 008.
>>>>> Document with id 009 should be displayed in the search result.
>>>>>
>>>>> From what you have said, it seems that you properly configured the
>>>>> patch.
>>>>>
>>>>> Martijn
>>>>>
>>>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>>>
>>>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>>>> missing
>>>>>> something or the way to show the collapsed documents when adjacent
>>>>>> collapse
>>>>>> can be sometimes confusing:
>>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>>> (not
>>>>>> using both at same time):
>>>>>>  <searchComponent name="query"
>>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>>> What I have noticed is, imagin you get these results in the search:
>>>>>> doc1:
>>>>>>   id:001
>>>>>>   collapseField:ccc
>>>>>> doc2:
>>>>>>   id:002
>>>>>>   collapseField:aaa
>>>>>> doc3:
>>>>>>   id:003
>>>>>>   collapseField:ccc
>>>>>> doc4:
>>>>>>   id:004
>>>>>>   collapseField:bbb
>>>>>>
>>>>>> And in the collapse_counts you get:
>>>>>> <int name="collapseCount">1</int>
>>>>>> <str name="fieldValue">ccc</str>
>>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>>> <doc>
>>>>>> <long name="id">008</long>
>>>>>> <str name="content">aaa aaa</str>
>>>>>> <str name="col">ccc</str>
>>>>>> </doc>
>>>>>> </result>
>>>>>>
>>>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>>>> could
>>>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>>>> the
>>>>>> collapsed documents?
>>>>>>
>>>>>> Adding something to collapse_counts like:
>>>>>> <int name="collapseCount">1</int>
>>>>>> <str name="fieldValue">ccc</str>
>>>>>> <str name="uniqueFieldId">003</str>
>>>>>>
>>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>>>> return:
>>>>>> <str name="fieldValue">ccc#003</str>
>>>>>> but this respose looks dirty...
>>>>>>
>>>>>> As I said maybe I am missunderstanding something and this can be
>>>>>> knwon
>>>>>> in
>>>>>> someway. In that case can someone tell me how?
>>>>>> Thanks in advance
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> JIRA jira@apache.org wrote:
>>>>>>>
>>>>>>>
>>>>>>>     [
>>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>>> ]
>>>>>>>
>>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56
>>>>>>> PM:
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>> I have attached a new patch that has the following changes:
>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>> [solr
>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>> configure
>>>>>>> field-collapsing with caching.
>>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
>>>>>>> used
>>>>>>> instead). It was deprecated for a long time.
>>>>>>>
>>>>>>>       was (Author: martijn):
>>>>>>>     I have attached a new patch that has the following changes:
>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>> [solr
>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>> configure
>>>>>>> the
>>>>>>> field-collapsing with caching.
>>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
>>>>>>> used
>>>>>>> instead). It was deprecated for a long time.
>>>>>>>
>>>>>>>> Field collapsing
>>>>>>>> ----------------
>>>>>>>>
>>>>>>>>                 Key: SOLR-236
>>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>>             Project: Solr
>>>>>>>>          Issue Type: New Feature
>>>>>>>>          Components: search
>>>>>>>>    Affects Versions: 1.3
>>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>>             Fix For: 1.5
>>>>>>>>
>>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>>> field-collapsing-extended-592129.patch,
>>>>>>>> field_collapsing_1.1.0.patch,
>>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>> solr-236.patch, SOLR-236_collapsing.patch,
>>>>>>>> SOLR-236_collapsing.patch
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>>> "Used in order to collapse a group of results with similar value
>>>>>>>> for
>>>>>>>> a
>>>>>>>> given field to a single entry in the result set. Site collapsing is
>>>>>>>> a
>>>>>>>> special case of this, where all results for a given web site is
>>>>>>>> collapsed
>>>>>>>> into one or two entries in the result set, typically with an
>>>>>>>> associated
>>>>>>>> "more documents from this site" link. See also Duplicate
>>>>>>>> detection."
>>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>>>> before
>>>>>>>> collapsing
>>>>>>>> TODO (in progress):
>>>>>>>> - More documentation (on source code)
>>>>>>>> - Test cases
>>>>>>>> Two patches:
>>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>>>
>>>>>>> --
>>>>>>> This message is automatically generated by JIRA.
>>>>>>> -
>>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Met vriendelijke groet,
>>>>>
>>>>> Martijn van Groningen
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679520.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Martijn v Groningen <ma...@gmail.com>.
The last two parameters are not necessary, since they default both to
true. Could you run the field collapse tests tests successful?

2009/12/7 Marc Sturlese <ma...@gmail.com>:
>
> The request I am sending is:
> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>
> I search for 'aaa' in the content field. All the documents in the result
> contain that string in the field content
>
> Martijn v Groningen wrote:
>>
>> Yes it should look similar to that. What is the exact request you send to
>> Solr?
>> Also to check if the patch works correctly can you run: ant clean test
>> There are a number of tests that test the Field collapse functionality.
>>
>> Martijn
>>
>>
>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>
>>>><lst name="collapse_counts">
>>>>   <str name="field">cat</str>
>>>>    <lst name="results">
>>>>        <lst name="009">
>>>>            <str name="fieldValue">hard</str>
>>>>           <int name="collapseCount">1</int>
>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>                 <doc>
>>>>                    <long name="id">008</long>
>>>>                    <str name="content">aaa aaa</str>
>>>>                    <str name="col">ccc</str>
>>>>                 </doc>
>>>>            </result>
>>>>        </lst>
>>>>        ...
>>>>    </lst>
>>>></lst>
>>> I see, looks like I am applying the patch wrongly somehow.
>>> This the complete collapse_counts response I am getting:
>>> <lst name="collapse_counts">
>>>  <str name="field">col</str>
>>>  <lst name="results">
>>>    <lst>
>>>      <int name="collapseCount">1</int>
>>>      <int name="collapseCount">1</int>
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">bbb</str>
>>>      <str name="fieldValue">ccc</str>
>>>      <str name="fieldValue">xxx</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">2</long>
>>>          <str name="content">aaa aaa</str>
>>>          <str name="col">bbb</str>
>>>        </doc>
>>>      </result>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">8</long>
>>>          <str name="content">aaa aaa aaa sd</str>
>>>          <str name="col">ccc</str>
>>>       </doc>
>>>      </result>
>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>        <doc>
>>>          <long name="id">12</long>
>>>          <str name="content">aaa aaa aaa v</str>
>>>          <str name="col">xxx</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>  </lst>
>>> </lst>
>>>
>>> As you can see I am getting a <lst> tag with no name. As I understood
>>> what
>>> you told me. I should be getting as many lst tags as collapsed groups and
>>> the name attribute of the lst should be the unique field value. So, if
>>> the
>>> patch was applyed correcly teh response should look like:
>>>
>>> <lst name="collapse_counts">
>>>  <str name="field">col</str>
>>>  <lst name="results">
>>>    <lst name="354> (the head value of the collapsed group)
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">bbb</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">2</long>
>>>          <str name="content">aaa aaa</str>
>>>          <str name="col">bbb</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>    <lst name="654">
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">ccc</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">8</long>
>>>          <str name="content">aaa aaa aaa sd</str>
>>>          <str name="col">ccc</str>
>>>       </doc>
>>>      </result>
>>>    </lst>
>>>    <lst name="654">
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">xxx</str>
>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>        <doc>
>>>          <long name="id">12</long>
>>>          <str name="content">aaa aaa aaa v</str>
>>>          <str name="col">xxx</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>  </lst>
>>> </lst>
>>>
>>> Is this the way the response looks like when you use teh patch?
>>> Thanks in advance
>>>
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> I'm not sure if I follow you completely, but the example you gave is
>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>> following response that the latest patches produce.
>>>>
>>>> <lst name="collapse_counts">
>>>>     <str name="field">cat</str>
>>>>     <lst name="results">
>>>>         <lst name="009">
>>>>             <str name="fieldValue">hard</str>
>>>>             <int name="collapseCount">1</int>
>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>                  <doc>
>>>>                     <long name="id">008</long>
>>>>                     <str name="content">aaa aaa</str>
>>>>                     <str name="col">ccc</str>
>>>>                  </doc>
>>>>             </result>
>>>>         </lst>
>>>>         ...
>>>>     </lst>
>>>> </lst>
>>>>
>>>> The result list contains collapse groups. The name of the child
>>>> elements are the collapse head ids. Everything that falls under the
>>>> collapse head belongs to that collapse group and thus adding document
>>>> head id to the field value is unnecessary.  In the above example
>>>> document with id 009 is the document head of document with id 008.
>>>> Document with id 009 should be displayed in the search result.
>>>>
>>>> From what you have said, it seems that you properly configured the
>>>> patch.
>>>>
>>>> Martijn
>>>>
>>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>>
>>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>>> missing
>>>>> something or the way to show the collapsed documents when adjacent
>>>>> collapse
>>>>> can be sometimes confusing:
>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>> (not
>>>>> using both at same time):
>>>>>  <searchComponent name="query"
>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>> What I have noticed is, imagin you get these results in the search:
>>>>> doc1:
>>>>>   id:001
>>>>>   collapseField:ccc
>>>>> doc2:
>>>>>   id:002
>>>>>   collapseField:aaa
>>>>> doc3:
>>>>>   id:003
>>>>>   collapseField:ccc
>>>>> doc4:
>>>>>   id:004
>>>>>   collapseField:bbb
>>>>>
>>>>> And in the collapse_counts you get:
>>>>> <int name="collapseCount">1</int>
>>>>> <str name="fieldValue">ccc</str>
>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>> <doc>
>>>>> <long name="id">008</long>
>>>>> <str name="content">aaa aaa</str>
>>>>> <str name="col">ccc</str>
>>>>> </doc>
>>>>> </result>
>>>>>
>>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>>> could
>>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>>> the
>>>>> collapsed documents?
>>>>>
>>>>> Adding something to collapse_counts like:
>>>>> <int name="collapseCount">1</int>
>>>>> <str name="fieldValue">ccc</str>
>>>>> <str name="uniqueFieldId">003</str>
>>>>>
>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>>> return:
>>>>> <str name="fieldValue">ccc#003</str>
>>>>> but this respose looks dirty...
>>>>>
>>>>> As I said maybe I am missunderstanding something and this can be knwon
>>>>> in
>>>>> someway. In that case can someone tell me how?
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> JIRA jira@apache.org wrote:
>>>>>>
>>>>>>
>>>>>>     [
>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>> ]
>>>>>>
>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>>>>> ----------------------------------------------------------------------
>>>>>>
>>>>>> I have attached a new patch that has the following changes:
>>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>>> field-collapsing with caching.
>>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>>> instead). It was deprecated for a long time.
>>>>>>
>>>>>>       was (Author: martijn):
>>>>>>     I have attached a new patch that has the following changes:
>>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>>> the
>>>>>> field-collapsing with caching.
>>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>>> instead). It was deprecated for a long time.
>>>>>>
>>>>>>> Field collapsing
>>>>>>> ----------------
>>>>>>>
>>>>>>>                 Key: SOLR-236
>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>             Project: Solr
>>>>>>>          Issue Type: New Feature
>>>>>>>          Components: search
>>>>>>>    Affects Versions: 1.3
>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>             Fix For: 1.5
>>>>>>>
>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>>>
>>>>>>>
>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>> "Used in order to collapse a group of results with similar value for
>>>>>>> a
>>>>>>> given field to a single entry in the result set. Site collapsing is a
>>>>>>> special case of this, where all results for a given web site is
>>>>>>> collapsed
>>>>>>> into one or two entries in the result set, typically with an
>>>>>>> associated
>>>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>>> before
>>>>>>> collapsing
>>>>>>> TODO (in progress):
>>>>>>> - More documentation (on source code)
>>>>>>> - Test cases
>>>>>>> Two patches:
>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>>
>>>>>> --
>>>>>> This message is automatically generated by JIRA.
>>>>>> -
>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Met vriendelijke groet,
>>>>
>>>> Martijn van Groningen
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Marc Sturlese <ma...@gmail.com>.
The request I am sending is:
http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true

I search for 'aaa' in the content field. All the documents in the result
contain that string in the field content

Martijn v Groningen wrote:
> 
> Yes it should look similar to that. What is the exact request you send to
> Solr?
> Also to check if the patch works correctly can you run: ant clean test
> There are a number of tests that test the Field collapse functionality.
> 
> Martijn
> 
> 
> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>
>>><lst name="collapse_counts">
>>>   <str name="field">cat</str>
>>>    <lst name="results">
>>>        <lst name="009">
>>>            <str name="fieldValue">hard</str>
>>>           <int name="collapseCount">1</int>
>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>                 <doc>
>>>                    <long name="id">008</long>
>>>                    <str name="content">aaa aaa</str>
>>>                    <str name="col">ccc</str>
>>>                 </doc>
>>>            </result>
>>>        </lst>
>>>        ...
>>>    </lst>
>>></lst>
>> I see, looks like I am applying the patch wrongly somehow.
>> This the complete collapse_counts response I am getting:
>> <lst name="collapse_counts">
>>  <str name="field">col</str>
>>  <lst name="results">
>>    <lst>
>>      <int name="collapseCount">1</int>
>>      <int name="collapseCount">1</int>
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">bbb</str>
>>      <str name="fieldValue">ccc</str>
>>      <str name="fieldValue">xxx</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">2</long>
>>          <str name="content">aaa aaa</str>
>>          <str name="col">bbb</str>
>>        </doc>
>>      </result>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">8</long>
>>          <str name="content">aaa aaa aaa sd</str>
>>          <str name="col">ccc</str>
>>       </doc>
>>      </result>
>>      <result name="collapsedDocs" numFound="4" start="0">
>>        <doc>
>>          <long name="id">12</long>
>>          <str name="content">aaa aaa aaa v</str>
>>          <str name="col">xxx</str>
>>        </doc>
>>      </result>
>>    </lst>
>>  </lst>
>> </lst>
>>
>> As you can see I am getting a <lst> tag with no name. As I understood
>> what
>> you told me. I should be getting as many lst tags as collapsed groups and
>> the name attribute of the lst should be the unique field value. So, if
>> the
>> patch was applyed correcly teh response should look like:
>>
>> <lst name="collapse_counts">
>>  <str name="field">col</str>
>>  <lst name="results">
>>    <lst name="354> (the head value of the collapsed group)
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">bbb</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">2</long>
>>          <str name="content">aaa aaa</str>
>>          <str name="col">bbb</str>
>>        </doc>
>>      </result>
>>    </lst>
>>    <lst name="654">
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">ccc</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">8</long>
>>          <str name="content">aaa aaa aaa sd</str>
>>          <str name="col">ccc</str>
>>       </doc>
>>      </result>
>>    </lst>
>>    <lst name="654">
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">xxx</str>
>>      <result name="collapsedDocs" numFound="4" start="0">
>>        <doc>
>>          <long name="id">12</long>
>>          <str name="content">aaa aaa aaa v</str>
>>          <str name="col">xxx</str>
>>        </doc>
>>      </result>
>>    </lst>
>>  </lst>
>> </lst>
>>
>> Is this the way the response looks like when you use teh patch?
>> Thanks in advance
>>
>>
>> Martijn v Groningen wrote:
>>>
>>> Hi Marc,
>>>
>>> I'm not sure if I follow you completely, but the example you gave is
>>> not complete. I'm missing a few tags in your example. Lets assume the
>>> following response that the latest patches produce.
>>>
>>> <lst name="collapse_counts">
>>>     <str name="field">cat</str>
>>>     <lst name="results">
>>>         <lst name="009">
>>>             <str name="fieldValue">hard</str>
>>>             <int name="collapseCount">1</int>
>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>                  <doc>
>>>                     <long name="id">008</long>
>>>                     <str name="content">aaa aaa</str>
>>>                     <str name="col">ccc</str>
>>>                  </doc>
>>>             </result>
>>>         </lst>
>>>         ...
>>>     </lst>
>>> </lst>
>>>
>>> The result list contains collapse groups. The name of the child
>>> elements are the collapse head ids. Everything that falls under the
>>> collapse head belongs to that collapse group and thus adding document
>>> head id to the field value is unnecessary.  In the above example
>>> document with id 009 is the document head of document with id 008.
>>> Document with id 009 should be displayed in the search result.
>>>
>>> From what you have said, it seems that you properly configured the
>>> patch.
>>>
>>> Martijn
>>>
>>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>>
>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>> missing
>>>> something or the way to show the collapsed documents when adjacent
>>>> collapse
>>>> can be sometimes confusing:
>>>> I am using the patch replacing queryComponent for collapseComponent
>>>> (not
>>>> using both at same time):
>>>>  <searchComponent name="query"
>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>> What I have noticed is, imagin you get these results in the search:
>>>> doc1:
>>>>   id:001
>>>>   collapseField:ccc
>>>> doc2:
>>>>   id:002
>>>>   collapseField:aaa
>>>> doc3:
>>>>   id:003
>>>>   collapseField:ccc
>>>> doc4:
>>>>   id:004
>>>>   collapseField:bbb
>>>>
>>>> And in the collapse_counts you get:
>>>> <int name="collapseCount">1</int>
>>>> <str name="fieldValue">ccc</str>
>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>> <doc>
>>>> <long name="id">008</long>
>>>> <str name="content">aaa aaa</str>
>>>> <str name="col">ccc</str>
>>>> </doc>
>>>> </result>
>>>>
>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>> could
>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>> the
>>>> collapsed documents?
>>>>
>>>> Adding something to collapse_counts like:
>>>> <int name="collapseCount">1</int>
>>>> <str name="fieldValue">ccc</str>
>>>> <str name="uniqueFieldId">003</str>
>>>>
>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>> return:
>>>> <str name="fieldValue">ccc#003</str>
>>>> but this respose looks dirty...
>>>>
>>>> As I said maybe I am missunderstanding something and this can be knwon
>>>> in
>>>> someway. In that case can someone tell me how?
>>>> Thanks in advance
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> JIRA jira@apache.org wrote:
>>>>>
>>>>>
>>>>>     [
>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>> ]
>>>>>
>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>>>> ----------------------------------------------------------------------
>>>>>
>>>>> I have attached a new patch that has the following changes:
>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>> field-collapsing with caching.
>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>> instead). It was deprecated for a long time.
>>>>>
>>>>>       was (Author: martijn):
>>>>>     I have attached a new patch that has the following changes:
>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>> the
>>>>> field-collapsing with caching.
>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>> instead). It was deprecated for a long time.
>>>>>
>>>>>> Field collapsing
>>>>>> ----------------
>>>>>>
>>>>>>                 Key: SOLR-236
>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>             Project: Solr
>>>>>>          Issue Type: New Feature
>>>>>>          Components: search
>>>>>>    Affects Versions: 1.3
>>>>>>            Reporter: Emmanuel Keller
>>>>>>             Fix For: 1.5
>>>>>>
>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>>
>>>>>>
>>>>>> This patch include a new feature called "Field collapsing".
>>>>>> "Used in order to collapse a group of results with similar value for
>>>>>> a
>>>>>> given field to a single entry in the result set. Site collapsing is a
>>>>>> special case of this, where all results for a given web site is
>>>>>> collapsed
>>>>>> into one or two entries in the result set, typically with an
>>>>>> associated
>>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>> "collapse.field" to choose the field used to group results
>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>> before
>>>>>> collapsing
>>>>>> TODO (in progress):
>>>>>> - More documentation (on source code)
>>>>>> - Test cases
>>>>>> Two patches:
>>>>>> - "field_collapsing.patch" for current development version
>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> -
>>>>> You can reply to this email to add a comment to the issue online.
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Martijn v Groningen <ma...@gmail.com>.
Yes it should look similar to that. What is the exact request you send to Solr?
Also to check if the patch works correctly can you run: ant clean test
There are a number of tests that test the Field collapse functionality.

Martijn


2009/12/7 Marc Sturlese <ma...@gmail.com>:
>
>><lst name="collapse_counts">
>>   <str name="field">cat</str>
>>    <lst name="results">
>>        <lst name="009">
>>            <str name="fieldValue">hard</str>
>>           <int name="collapseCount">1</int>
>>            <result name="collapsedDocs" numFound="1" start="0">
>>                 <doc>
>>                    <long name="id">008</long>
>>                    <str name="content">aaa aaa</str>
>>                    <str name="col">ccc</str>
>>                 </doc>
>>            </result>
>>        </lst>
>>        ...
>>    </lst>
>></lst>
> I see, looks like I am applying the patch wrongly somehow.
> This the complete collapse_counts response I am getting:
> <lst name="collapse_counts">
>  <str name="field">col</str>
>  <lst name="results">
>    <lst>
>      <int name="collapseCount">1</int>
>      <int name="collapseCount">1</int>
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">bbb</str>
>      <str name="fieldValue">ccc</str>
>      <str name="fieldValue">xxx</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">2</long>
>          <str name="content">aaa aaa</str>
>          <str name="col">bbb</str>
>        </doc>
>      </result>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">8</long>
>          <str name="content">aaa aaa aaa sd</str>
>          <str name="col">ccc</str>
>       </doc>
>      </result>
>      <result name="collapsedDocs" numFound="4" start="0">
>        <doc>
>          <long name="id">12</long>
>          <str name="content">aaa aaa aaa v</str>
>          <str name="col">xxx</str>
>        </doc>
>      </result>
>    </lst>
>  </lst>
> </lst>
>
> As you can see I am getting a <lst> tag with no name. As I understood what
> you told me. I should be getting as many lst tags as collapsed groups and
> the name attribute of the lst should be the unique field value. So, if the
> patch was applyed correcly teh response should look like:
>
> <lst name="collapse_counts">
>  <str name="field">col</str>
>  <lst name="results">
>    <lst name="354> (the head value of the collapsed group)
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">bbb</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">2</long>
>          <str name="content">aaa aaa</str>
>          <str name="col">bbb</str>
>        </doc>
>      </result>
>    </lst>
>    <lst name="654">
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">ccc</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">8</long>
>          <str name="content">aaa aaa aaa sd</str>
>          <str name="col">ccc</str>
>       </doc>
>      </result>
>    </lst>
>    <lst name="654">
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">xxx</str>
>      <result name="collapsedDocs" numFound="4" start="0">
>        <doc>
>          <long name="id">12</long>
>          <str name="content">aaa aaa aaa v</str>
>          <str name="col">xxx</str>
>        </doc>
>      </result>
>    </lst>
>  </lst>
> </lst>
>
> Is this the way the response looks like when you use teh patch?
> Thanks in advance
>
>
> Martijn v Groningen wrote:
>>
>> Hi Marc,
>>
>> I'm not sure if I follow you completely, but the example you gave is
>> not complete. I'm missing a few tags in your example. Lets assume the
>> following response that the latest patches produce.
>>
>> <lst name="collapse_counts">
>>     <str name="field">cat</str>
>>     <lst name="results">
>>         <lst name="009">
>>             <str name="fieldValue">hard</str>
>>             <int name="collapseCount">1</int>
>>             <result name="collapsedDocs" numFound="1" start="0">
>>                  <doc>
>>                     <long name="id">008</long>
>>                     <str name="content">aaa aaa</str>
>>                     <str name="col">ccc</str>
>>                  </doc>
>>             </result>
>>         </lst>
>>         ...
>>     </lst>
>> </lst>
>>
>> The result list contains collapse groups. The name of the child
>> elements are the collapse head ids. Everything that falls under the
>> collapse head belongs to that collapse group and thus adding document
>> head id to the field value is unnecessary.  In the above example
>> document with id 009 is the document head of document with id 008.
>> Document with id 009 should be displayed in the search result.
>>
>> From what you have said, it seems that you properly configured the patch.
>>
>> Martijn
>>
>> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>>
>>> Hey there, I have beeb testing the last patch and I think or I am missing
>>> something or the way to show the collapsed documents when adjacent
>>> collapse
>>> can be sometimes confusing:
>>> I am using the patch replacing queryComponent for collapseComponent (not
>>> using both at same time):
>>>  <searchComponent name="query"
>>> class="org.apache.solr.handler.component.CollapseComponent">
>>> What I have noticed is, imagin you get these results in the search:
>>> doc1:
>>>   id:001
>>>   collapseField:ccc
>>> doc2:
>>>   id:002
>>>   collapseField:aaa
>>> doc3:
>>>   id:003
>>>   collapseField:ccc
>>> doc4:
>>>   id:004
>>>   collapseField:bbb
>>>
>>> And in the collapse_counts you get:
>>> <int name="collapseCount">1</int>
>>> <str name="fieldValue">ccc</str>
>>> <result name="collapsedDocs" numFound="1" start="0">
>>> <doc>
>>> <long name="id">008</long>
>>> <str name="content">aaa aaa</str>
>>> <str name="col">ccc</str>
>>> </doc>
>>> </result>
>>>
>>> Now, how can I know the head document of doc 008? Both 001 and 003 could
>>> be... wouldn't make sense to connect in someway  the uniqueField with the
>>> collapsed documents?
>>>
>>> Adding something to collapse_counts like:
>>> <int name="collapseCount">1</int>
>>> <str name="fieldValue">ccc</str>
>>> <str name="uniqueFieldId">003</str>
>>>
>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>> return:
>>> <str name="fieldValue">ccc#003</str>
>>> but this respose looks dirty...
>>>
>>> As I said maybe I am missunderstanding something and this can be knwon in
>>> someway. In that case can someone tell me how?
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>>
>>>
>>> JIRA jira@apache.org wrote:
>>>>
>>>>
>>>>     [
>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>> ]
>>>>
>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>>> ----------------------------------------------------------------------
>>>>
>>>> I have attached a new patch that has the following changes:
>>>> # Added caching for the field collapse functionality. Check the [solr
>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>> field-collapsing with caching.
>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>> instead). It was deprecated for a long time.
>>>>
>>>>       was (Author: martijn):
>>>>     I have attached a new patch that has the following changes:
>>>> # Added caching for the field collapse functionality. Check the [solr
>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>> the
>>>> field-collapsing with caching.
>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>> instead). It was deprecated for a long time.
>>>>
>>>>> Field collapsing
>>>>> ----------------
>>>>>
>>>>>                 Key: SOLR-236
>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>             Project: Solr
>>>>>          Issue Type: New Feature
>>>>>          Components: search
>>>>>    Affects Versions: 1.3
>>>>>            Reporter: Emmanuel Keller
>>>>>             Fix For: 1.5
>>>>>
>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>
>>>>>
>>>>> This patch include a new feature called "Field collapsing".
>>>>> "Used in order to collapse a group of results with similar value for a
>>>>> given field to a single entry in the result set. Site collapsing is a
>>>>> special case of this, where all results for a given web site is
>>>>> collapsed
>>>>> into one or two entries in the result set, typically with an associated
>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>> "collapse.field" to choose the field used to group results
>>>>> "collapse.type" normal (default value) or adjacent
>>>>> "collapse.max" to select how many continuous results are allowed before
>>>>> collapsing
>>>>> TODO (in progress):
>>>>> - More documentation (on source code)
>>>>> - Test cases
>>>>> Two patches:
>>>>> - "field_collapsing.patch" for current development version
>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Marc Sturlese <ma...@gmail.com>.
><lst name="collapse_counts">
>   <str name="field">cat</str>
>    <lst name="results">
>        <lst name="009">
>            <str name="fieldValue">hard</str>
>           <int name="collapseCount">1</int>
>            <result name="collapsedDocs" numFound="1" start="0">
>                 <doc>
>                    <long name="id">008</long>
>                    <str name="content">aaa aaa</str>
>                    <str name="col">ccc</str>
>                 </doc>
>            </result>
>        </lst>
>        ...
>    </lst>
></lst>
I see, looks like I am applying the patch wrongly somehow.
This the complete collapse_counts response I am getting:
<lst name="collapse_counts">
  <str name="field">col</str>
  <lst name="results">
    <lst>
      <int name="collapseCount">1</int>
      <int name="collapseCount">1</int>
      <int name="collapseCount">1</int>
      <str name="fieldValue">bbb</str>
      <str name="fieldValue">ccc</str>
      <str name="fieldValue">xxx</str>
      <result name="collapsedDocs" numFound="1" start="0">
        <doc>
          <long name="id">2</long>
          <str name="content">aaa aaa</str>
          <str name="col">bbb</str>
        </doc>
      </result>
      <result name="collapsedDocs" numFound="1" start="0">
        <doc>
          <long name="id">8</long>
          <str name="content">aaa aaa aaa sd</str>
          <str name="col">ccc</str>
       </doc>
      </result>
      <result name="collapsedDocs" numFound="4" start="0">
        <doc>
          <long name="id">12</long>
          <str name="content">aaa aaa aaa v</str>
          <str name="col">xxx</str>
        </doc>
      </result>
    </lst>
  </lst>
</lst>

As you can see I am getting a <lst> tag with no name. As I understood what
you told me. I should be getting as many lst tags as collapsed groups and
the name attribute of the lst should be the unique field value. So, if the
patch was applyed correcly teh response should look like:

<lst name="collapse_counts">
  <str name="field">col</str>
  <lst name="results">
    <lst name="354> (the head value of the collapsed group)
      <int name="collapseCount">1</int>
      <str name="fieldValue">bbb</str>
      <result name="collapsedDocs" numFound="1" start="0">
        <doc>
          <long name="id">2</long>
          <str name="content">aaa aaa</str>
          <str name="col">bbb</str>
        </doc>
      </result>
    </lst>
    <lst name="654">
      <int name="collapseCount">1</int>
      <str name="fieldValue">ccc</str>
      <result name="collapsedDocs" numFound="1" start="0">
        <doc>
          <long name="id">8</long>
          <str name="content">aaa aaa aaa sd</str>
          <str name="col">ccc</str>
       </doc>
      </result>
    </lst>
    <lst name="654">
      <int name="collapseCount">1</int>
      <str name="fieldValue">xxx</str>
      <result name="collapsedDocs" numFound="4" start="0">
        <doc>
          <long name="id">12</long>
          <str name="content">aaa aaa aaa v</str>
          <str name="col">xxx</str>
        </doc>
      </result>
    </lst>
  </lst>
</lst>

Is this the way the response looks like when you use teh patch?
Thanks in advance


Martijn v Groningen wrote:
> 
> Hi Marc,
> 
> I'm not sure if I follow you completely, but the example you gave is
> not complete. I'm missing a few tags in your example. Lets assume the
> following response that the latest patches produce.
> 
> <lst name="collapse_counts">
>     <str name="field">cat</str>
>     <lst name="results">
>         <lst name="009">
>             <str name="fieldValue">hard</str>
>             <int name="collapseCount">1</int>
>             <result name="collapsedDocs" numFound="1" start="0">
>                  <doc>
>                     <long name="id">008</long>
>                     <str name="content">aaa aaa</str>
>                     <str name="col">ccc</str>
>                  </doc>
>             </result>
>         </lst>
>         ...
>     </lst>
> </lst>
> 
> The result list contains collapse groups. The name of the child
> elements are the collapse head ids. Everything that falls under the
> collapse head belongs to that collapse group and thus adding document
> head id to the field value is unnecessary.  In the above example
> document with id 009 is the document head of document with id 008.
> Document with id 009 should be displayed in the search result.
> 
> From what you have said, it seems that you properly configured the patch.
> 
> Martijn
> 
> 2009/12/7 Marc Sturlese <ma...@gmail.com>:
>>
>> Hey there, I have beeb testing the last patch and I think or I am missing
>> something or the way to show the collapsed documents when adjacent
>> collapse
>> can be sometimes confusing:
>> I am using the patch replacing queryComponent for collapseComponent (not
>> using both at same time):
>>  <searchComponent name="query"
>> class="org.apache.solr.handler.component.CollapseComponent">
>> What I have noticed is, imagin you get these results in the search:
>> doc1:
>>   id:001
>>   collapseField:ccc
>> doc2:
>>   id:002
>>   collapseField:aaa
>> doc3:
>>   id:003
>>   collapseField:ccc
>> doc4:
>>   id:004
>>   collapseField:bbb
>>
>> And in the collapse_counts you get:
>> <int name="collapseCount">1</int>
>> <str name="fieldValue">ccc</str>
>> <result name="collapsedDocs" numFound="1" start="0">
>> <doc>
>> <long name="id">008</long>
>> <str name="content">aaa aaa</str>
>> <str name="col">ccc</str>
>> </doc>
>> </result>
>>
>> Now, how can I know the head document of doc 008? Both 001 and 003 could
>> be... wouldn't make sense to connect in someway  the uniqueField with the
>> collapsed documents?
>>
>> Adding something to collapse_counts like:
>> <int name="collapseCount">1</int>
>> <str name="fieldValue">ccc</str>
>> <str name="uniqueFieldId">003</str>
>>
>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>> return:
>> <str name="fieldValue">ccc#003</str>
>> but this respose looks dirty...
>>
>> As I said maybe I am missunderstanding something and this can be knwon in
>> someway. In that case can someone tell me how?
>> Thanks in advance
>>
>>
>>
>>
>>
>>
>> JIRA jira@apache.org wrote:
>>>
>>>
>>>     [
>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>> ]
>>>
>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>> ----------------------------------------------------------------------
>>>
>>> I have attached a new patch that has the following changes:
>>> # Added caching for the field collapse functionality. Check the [solr
>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>> field-collapsing with caching.
>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>> instead). It was deprecated for a long time.
>>>
>>>       was (Author: martijn):
>>>     I have attached a new patch that has the following changes:
>>> # Added caching for the field collapse functionality. Check the [solr
>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>> the
>>> field-collapsing with caching.
>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>> instead). It was deprecated for a long time.
>>>
>>>> Field collapsing
>>>> ----------------
>>>>
>>>>                 Key: SOLR-236
>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>             Project: Solr
>>>>          Issue Type: New Feature
>>>>          Components: search
>>>>    Affects Versions: 1.3
>>>>            Reporter: Emmanuel Keller
>>>>             Fix For: 1.5
>>>>
>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>
>>>>
>>>> This patch include a new feature called "Field collapsing".
>>>> "Used in order to collapse a group of results with similar value for a
>>>> given field to a single entry in the result set. Site collapsing is a
>>>> special case of this, where all results for a given web site is
>>>> collapsed
>>>> into one or two entries in the result set, typically with an associated
>>>> "more documents from this site" link. See also Duplicate detection."
>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>> The implementation add 3 new query parameters (SolrParams):
>>>> "collapse.field" to choose the field used to group results
>>>> "collapse.type" normal (default value) or adjacent
>>>> "collapse.max" to select how many continuous results are allowed before
>>>> collapsing
>>>> TODO (in progress):
>>>> - More documentation (on source code)
>>>> - Test cases
>>>> Two patches:
>>>> - "field_collapsing.patch" for current development version
>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Marc,

I'm not sure if I follow you completely, but the example you gave is
not complete. I'm missing a few tags in your example. Lets assume the
following response that the latest patches produce.

<lst name="collapse_counts">
    <str name="field">cat</str>
    <lst name="results">
        <lst name="009">
            <str name="fieldValue">hard</str>
            <int name="collapseCount">1</int>
            <result name="collapsedDocs" numFound="1" start="0">
                 <doc>
                    <long name="id">008</long>
                    <str name="content">aaa aaa</str>
                    <str name="col">ccc</str>
                 </doc>
            </result>
        </lst>
        ...
    </lst>
</lst>

The result list contains collapse groups. The name of the child
elements are the collapse head ids. Everything that falls under the
collapse head belongs to that collapse group and thus adding document
head id to the field value is unnecessary.  In the above example
document with id 009 is the document head of document with id 008.
Document with id 009 should be displayed in the search result.

>From what you have said, it seems that you properly configured the patch.

Martijn

2009/12/7 Marc Sturlese <ma...@gmail.com>:
>
> Hey there, I have beeb testing the last patch and I think or I am missing
> something or the way to show the collapsed documents when adjacent collapse
> can be sometimes confusing:
> I am using the patch replacing queryComponent for collapseComponent (not
> using both at same time):
>  <searchComponent name="query"
> class="org.apache.solr.handler.component.CollapseComponent">
> What I have noticed is, imagin you get these results in the search:
> doc1:
>   id:001
>   collapseField:ccc
> doc2:
>   id:002
>   collapseField:aaa
> doc3:
>   id:003
>   collapseField:ccc
> doc4:
>   id:004
>   collapseField:bbb
>
> And in the collapse_counts you get:
> <int name="collapseCount">1</int>
> <str name="fieldValue">ccc</str>
> <result name="collapsedDocs" numFound="1" start="0">
> <doc>
> <long name="id">008</long>
> <str name="content">aaa aaa</str>
> <str name="col">ccc</str>
> </doc>
> </result>
>
> Now, how can I know the head document of doc 008? Both 001 and 003 could
> be... wouldn't make sense to connect in someway  the uniqueField with the
> collapsed documents?
>
> Adding something to collapse_counts like:
> <int name="collapseCount">1</int>
> <str name="fieldValue">ccc</str>
> <str name="uniqueFieldId">003</str>
>
> I currently have hacked FieldValueCountCollapseCollectorFactory to return:
> <str name="fieldValue">ccc#003</str>
> but this respose looks dirty...
>
> As I said maybe I am missunderstanding something and this can be knwon in
> someway. In that case can someone tell me how?
> Thanks in advance
>
>
>
>
>
>
> JIRA jira@apache.org wrote:
>>
>>
>>     [
>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>> ]
>>
>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>> ----------------------------------------------------------------------
>>
>> I have attached a new patch that has the following changes:
>> # Added caching for the field collapse functionality. Check the [solr
>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>> field-collapsing with caching.
>> # Removed the collapse.max parameter (collapse.threshold must be used
>> instead). It was deprecated for a long time.
>>
>>       was (Author: martijn):
>>     I have attached a new patch that has the following changes:
>> # Added caching for the field collapse functionality. Check the [solr
>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the
>> field-collapsing with caching.
>> # Removed the collapse.max parameter (collapse.threshold must be used
>> instead). It was deprecated for a long time.
>>
>>> Field collapsing
>>> ----------------
>>>
>>>                 Key: SOLR-236
>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>             Project: Solr
>>>          Issue Type: New Feature
>>>          Components: search
>>>    Affects Versions: 1.3
>>>            Reporter: Emmanuel Keller
>>>             Fix For: 1.5
>>>
>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>> collapsing-patch-to-1.3.0-ivan.patch,
>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>
>>>
>>> This patch include a new feature called "Field collapsing".
>>> "Used in order to collapse a group of results with similar value for a
>>> given field to a single entry in the result set. Site collapsing is a
>>> special case of this, where all results for a given web site is collapsed
>>> into one or two entries in the result set, typically with an associated
>>> "more documents from this site" link. See also Duplicate detection."
>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>> The implementation add 3 new query parameters (SolrParams):
>>> "collapse.field" to choose the field used to group results
>>> "collapse.type" normal (default value) or adjacent
>>> "collapse.max" to select how many continuous results are allowed before
>>> collapsing
>>> TODO (in progress):
>>> - More documentation (on source code)
>>> - Test cases
>>> Two patches:
>>> - "field_collapsing.patch" for current development version
>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by Marc Sturlese <ma...@gmail.com>.
Hey there, I have beeb testing the last patch and I think or I am missing
something or the way to show the collapsed documents when adjacent collapse
can be sometimes confusing:
I am using the patch replacing queryComponent for collapseComponent (not
using both at same time):
  <searchComponent name="query"
class="org.apache.solr.handler.component.CollapseComponent">
What I have noticed is, imagin you get these results in the search:
doc1:
   id:001
   collapseField:ccc
doc2:
   id:002
   collapseField:aaa
doc3:
   id:003
   collapseField:ccc
doc4:
   id:004
   collapseField:bbb

And in the collapse_counts you get:
<int name="collapseCount">1</int>
<str name="fieldValue">ccc</str>
<result name="collapsedDocs" numFound="1" start="0">
<doc>
<long name="id">008</long>
<str name="content">aaa aaa</str>
<str name="col">ccc</str>
</doc>
</result>

Now, how can I know the head document of doc 008? Both 001 and 003 could
be... wouldn't make sense to connect in someway  the uniqueField with the
collapsed documents?

Adding something to collapse_counts like:
<int name="collapseCount">1</int>
<str name="fieldValue">ccc</str>
<str name="uniqueFieldId">003</str>

I currently have hacked FieldValueCountCollapseCollectorFactory to return:
<str name="fieldValue">ccc#003</str>
but this respose looks dirty...

As I said maybe I am missunderstanding something and this can be knwon in
someway. In that case can someone tell me how?
Thanks in advance






JIRA jira@apache.org wrote:
> 
> 
>     [
> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
> ] 
> 
> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
> ----------------------------------------------------------------------
> 
> I have attached a new patch that has the following changes:
> # Added caching for the field collapse functionality. Check the [solr
> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
> field-collapsing with caching.
> # Removed the collapse.max parameter (collapse.threshold must be used
> instead). It was deprecated for a long time. 
> 
>       was (Author: martijn):
>     I have attached a new patch that has the following changes:
> # Added caching for the field collapse functionality. Check the [solr
> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the
> field-collapsing with caching.
> # Removed the collapse.max parameter (collapse.threshold must be used
> instead). It was deprecated for a long time. 
>   
>> Field collapsing
>> ----------------
>>
>>                 Key: SOLR-236
>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.3
>>            Reporter: Emmanuel Keller
>>             Fix For: 1.5
>>
>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>> collapsing-patch-to-1.3.0-ivan.patch,
>> collapsing-patch-to-1.3.0-ivan_2.patch,
>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>
>>
>> This patch include a new feature called "Field collapsing".
>> "Used in order to collapse a group of results with similar value for a
>> given field to a single entry in the result set. Site collapsing is a
>> special case of this, where all results for a given web site is collapsed
>> into one or two entries in the result set, typically with an associated
>> "more documents from this site" link. See also Duplicate detection."
>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>> The implementation add 3 new query parameters (SolrParams):
>> "collapse.field" to choose the field used to group results
>> "collapse.type" normal (default value) or adjacent
>> "collapse.max" to select how many continuous results are allowed before
>> collapsing
>> TODO (in progress):
>> - More documentation (on source code)
>> - Test cases
>> Two patches:
>> - "field_collapsing.patch" for current development version
>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>> P.S.: Feedback and misspelling correction are welcome ;-)
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment:     (was: field_collapsing_1.3.patch)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839657#action_12839657 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

That makes sense. I initially made it an array to maintain the document order for the scores, but this order is already in the openbitset. I think a Map is a good idea. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792510#action_12792510 ] 

Mark Miller commented on SOLR-236:
----------------------------------

bq. (Faceting fot a 50 times perf boost in 1.4)

No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-field faceting cases?). And general faceting was always relatively fast and scalable.

I'm against committing features to trunk with a warning that the feature is not ready for trunk.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728050#action_12728050 ] 

David Smiley commented on SOLR-236:
-----------------------------------

Auto-reply: I'm on Vacation this week.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793644#action_12793644 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

Quick note on the collapse cache - we just went into production with 1.4 and right away we had to turn off the collapse cache.  This was with 1.4 dist and the patch from 12/12.  With the cache enabled, RAM consumption was through the roof on the production servers - I guess with the variety of queries coming in, it filled up very fast.   It almost maxed out a machine with 18GB devoted to jetty in about 20 minutes.   We just used the sample config (maxSize=512), it looks like there were about 60 entries in the cache before we restarted.  We would see the memory usage jump by as much as 2% after just one query.

Without the cache the performance is still quite good (far better than what we had before) so we're not plussed, but it may indicate there needs to be more optimization there...  Generally our consumption rarely goes over 50% on this machine unless we have a lot of commits coming in.  The cache *did* provide some performance benefits on some of the queries that return large numbers of results (1M+) so it would be nice to have.  Of course, it's possible with our index that these levels of RAM consumption would be unavoidable.  I'm not sure if there's any further specifics I could provide that would be helpful, let me know.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Marc Menghin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789381#action_12789381 ] 

Marc Menghin commented on SOLR-236:
-----------------------------------

Hi,

new to Solr, so sorry for my likely still incomplete setup. I got everything from Solr SVN and applied the Patch (field-collapse-5.patch	2009-12-08 09:43 PM). As I search I get a NPE because I seem to not have a cache for the collapsing. It wants to add a entry to the cache but can't. There is none at that time, which it checks before in AbstractDocumentCollapser.collapse but still wants to use it later in AbstractDocumentCollapser.createDocumentCollapseResult. I suppose thats a bug? Or is something wrong on my side?

Exception I get is:

java.lang.NullPointerException
	at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:278)
	at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249)
	at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172)
	at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

I fixed it locally by only adding something to the cache if there is one (fieldCollapseCache != null). But I'm not very into the code so not sure if thats a good/right way to fix it.

Thanks,
Marc

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Robert Zotter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848978#action_12848978 ] 

Robert Zotter commented on SOLR-236:
------------------------------------

What are the required steps to get this patch working with a clean 1.4? Is it even compatible? I've read in the above comments that the 12/12 field-collapse-5.patch will patch correctly but it has horrible memory bugs. Has there been any updates on this? Recommendations anyone?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Dima Brodsky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535999 ] 

Dima Brodsky commented on SOLR-236:
-----------------------------------

Hi,

I am new to the list and to Solr, so I appologize in advance if I say something silly.

I have been playing with the field collapse patch and I have a couple of questions and I have noticed a couple of issues.  What is the intended use / audience for the field collapsing patch.  One of the issues I see is that the sort order is changed during normal field collapsing and this causes problems if I want the results ordered based on relevancy.  Another issue, is that the backfilling of the results, if there is not enough, is done from the deduped results rather than getting more results from the index.  Is this by design?

Thanks!!
ttyl
Dima


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829727#action_12829727 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

If you look into the AbstractDocumentCollapser#createDocumentCollapseResult() you will see that the collapseResult will never be null. Therefore I think the null check is not necessary. 
It think the following code is sufficient:
{code}
DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
      collapseResult.getCollapsedDocset(),
      rb.getSortSpec().getSort(),
      rb.getSortSpec().getOffset(),
      rb.getSortSpec().getCount(),
      rb.getFieldFlags());
{code}
Also specifying the filters is unnecessary, because it was already taken into account when creating the uncollapsed docset. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-236) Field collapsing

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic reassigned SOLR-236:
-------------------------------------

    Assignee:     (was: Otis Gospodnetic)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791986#action_12791986 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Shalin.
1. This configuration also looks fine by me. The reason I added <fieldCollapsing> ... </fieldCollapsing> was to be able support sharing of collapseCollectorFactory instances between different collapse components in the near future. You think that is a valid reason for that? Or do you think that collapseCollectorFactories shouldn't be shared? 
2. I forgot to create that, so a good thing you added it.
3. I think leaving out those changes will make the distributed integration tests fail (Haven't checked it).

Noble. 
1. The reason I gave a name to collaspeCollectorFactory was for using an instance twice for different collapse components. 
2. Moving the classname to the class attribute looks better, then in the function element. So I think we should change that.

Grant. 
1. I think you also referring to sharding. Sharding is supported, but not in a very elegant way. You will need to partition your documents to your shards in such a way that all documents belonging to a collapse group appear on one shard. To be honest I have never tested the patch on a corpus of 100M docs.
2. Field collapsing can impact the search time in a very negative way. I wrote a small paragraph about it on my [blog|http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/].
3. The first two response examples are for 'old' patches. The last response example is for the more recent patches (and current patch). 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Dave Redford (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694851#action_12694851 ] 

Dave Redford edited comment on SOLR-236 at 4/10/09 6:46 PM:
------------------------------------------------------------

There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results are present - which I was using for testing]

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...


      was (Author: dredford):
    There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Paul Nelson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753335#action_12753335 ] 

Paul Nelson commented on SOLR-236:
----------------------------------

Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are <0.5sec without collapsing, they are 3-4sec with collapsing.

I'm wondering if the filter cache (or some other cache) might be able to help with this situation?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792199#action_12792199 ] 

Grant Ingersoll commented on SOLR-236:
--------------------------------------

{quote}
Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later.
{quote}

Which is why it should not go in unless it is ready.  Adding a large patch that isn't right just b/c it's been around for a while and is "hard to maintain" is no reason to just go commit something.  The problem w/ committing something that isn't ready is then we have to do even more work to maintain it, thus taking away from the opportunity to make it better.   

As for the voting and the popularity, I think that is all the more reason why it needs to be done right and not just be a "good start".  With this many eyes on it, it shouldn't be easy to get people testing it and giving feedback.

If the issue is that the patch is to big, then perhaps it needs to be broken up into smaller pieces that lay the framework for field collapsing to work.  


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496617 ] 

Otis Gospodnetic commented on SOLR-236:
---------------------------------------

Question:
Do you need collapse=true when you can detect whether collapse.field has been specified or not?


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 4 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Aytek Ekici (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765076#action_12765076 ] 

Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:46 AM:
------------------------------------------------------------

Hi all,
Just applied "field-collapse-5.patch" and i guess there are problems with filter queries.

Here it is:

1- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]
numFound: 6284

2- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lng:[24.5 TO 29.9]
numFound: 16912

3- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]&fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using "q" instead of "fq" which is http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying "AND" for each filter query it applies "OR". Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.

      was (Author: aytek):
    Hi all,
Just applied "field-collapse-5.patch" and i guess there are problems with filter queries.

Here it is:

1- Use one(first) filter

http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]
numFound: 6284

2- Use second filter
http://10.231.14.252:8080/myindex/select?q=*:*&fq=lng:[24.5 TO 29.9]
numFound: 16912

3- Use both filters
http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]&fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using "q" instead of "fq" which is : http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying "AND" for each filter query it applies "OR". Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777649#action_12777649 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Thomas, the method that cannot be found ( SolrIndexSearcher.getDocSet(...) ) is a method that is part of the patch. So if the patch was successful applied then this should not happen. 
When I released the latest patch I only tested against the solr trunk, but I have tried the following to verify that the patch works with 1.4.0 release:
* Dowloaded 1.4.0 release from Solr site
* Applied the patch
* Executed: ant clean dist example
* In the example config (example/solr/conf/solrconfig.xml) I added the following line under the standard request handler:
{code:xml}<searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent" />{code}
* Started the Jetty with Solr with the following command: java -jar start.jar
* Added example data to Solr with the following command in the exampledocs dir: ./post.sh *.xml
* I Browsed to the following url: http://localhost:8983/solr/select/?q=*:*&collapse.field=inStock and saw that the result was collapsed on the inStock field.

It seems that everything is running fine. Can you tell something about how you deployed Solr on your machine?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-236:
-------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

I updated the patch so that is applies cleanly with trunk, while I was at it, I:
* fixed a few spelling errors
* made the "collapse.type" parameter parsing to throw an error if the passed field is unknown (rather then quietly using 'normal')
* changed the patch name to include the number. -- as we update the patch, use this same name again so it is easy to tell what is the most current.

I also made a wiki page so there are direct links to interesting queries:
http://wiki.apache.org/solr/FieldCollapsing

- - - - - - -

Again, I will leave any discussion about the lucene implementation to other more qualified and will just focus on the response interface.

Currently if you send the query:
http://localhost:8983/solr/select/?q=*:*&collapse.field=cat&collapse.max=1&collapse.type=normal

you get a response that looks like:
<lst name="collapse_counts">
 <int name="hard">1</int>
 <int name="electronics">2</int>
 <int name="memory">2</int>
 <int name="monitor">1</int>
 <int name="software">1</int>
</lst>

It looks like that says: for the field 'cat', there is one more result with cat=hard, 2 more results with cat=electronics, ...

How is a client supposed to know how to deal with that?  "hard" is tokenized version of "hard drive" -- unless it were a 'string' field, the client would need to know how to do that -- or the response needs to change.

>From a client, it would be more useful to have output that looked something like:
<lst name="collapse_counts">
 <str name="field">cat</str>
 <lst name="doc">
  <int name="SP2514N">1</int>
  <int name="6H500F0">1</int>
  <int name="VS1GB400C3">2</int>
  <int name="VS1GB400C3">1</int>
 </lst>
 <lst name="count">
  <int name="hard">1</int>
  <int name="electronics">1</int>
  <int name="memory">2</int>
  <int name="monitor">1</int>
 </lst>
</lst>

"field" says what field was collapsed on,
"doc" is a map of doc id -> how many more collapsed on that field
"count" is a map of 'token'-> how many more collapsed on that field

This way, the client would know what collapse counts apply to which documents without knowing about the schema.

thoughts?






> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have updated the field collapse patch with the following:
1. Added the return collapse documents feature. When the parameter _collapse.includeCollapsedDocs_ with value true is specified then the collapsed documents will returned per distinct field value. When this feature is enabled a collapsedDocs element is added to the field collapse response part. It looks like this:
{code:xml}
<lst name="collapsedDocs">
  <result name="Amsterdam" numFound="2" start="0">
	<doc>
	 <str name="id">262701</str>
	 <str name="title">Bitterzoet, 100% Halal, Appletree Records &amp; Deux d'Amsterdam presents</str>
	</doc>
	<doc>
	 <str name="id">327511</str>
	 <str name="title">Salsa Danscafé</str>
	</doc>
  </result>
 </lst>
{code}
It is also possible to return only specific fields with the _collapse.includeCollapsedDocs.fl_ parameter. It expects fieldnames delimited by comma, just like the normal fl parameter. 

These feature can dramatically impact the performance, because a group can potently contain many documents which all have to retrieved from the index and transported over the wire. So it is certainly wise to use it in combination with the fl parameter. 
2. Added Solrj support for collapsed documents feature. 
3. Added the performance improvements that Abdul suggested.
4. The debug information is now *not* returned by default. When the parameter _collapse.debug_ with value true is specified, then the debug information is returned.
5. When field collapsing is done on a field that is multivalued or tokenized then an exception is thrown. I have chosen to do this because collapsing on such fields lead to unexpected results. For example when a field is tokenized only the last token of the field can be retrieved from the fieldcache (the fieldcache is used for retrieving the fields from the index in a cached manner for grouping documents into groups of distinct field values). This results in collapsing only on the last token of a field value instead of the complete field value. Multivalued fields have similar behaviour, plus for multivalued fields the Lucene FieldCache throws an exception when there are more tokens for a field than documents. Personally I think that throwing an exception is better then have unexpected results, at least it is clear that something field collapse related is wrong.
6. When doing a normal field collapse and not sorting on score the Solr caching mechanism is used. Unfortunately this was previously not the case.

@Paul
When doing non adjacent collapsing (aka normal collapsing) the Solr caches are not being used. The current patch uses the Solr caches when doing a search without scoring, but still the most common case is of course field collapsing and sorting on score. This is because the non adjacent field collapse algorithm requires the score of all results, which is collected with a Lucene collector. The search method on the SolrIndexSearcher that specifies a collector, does not have caching capabilities. In the next patch I will fix this problem, so that normal field collapse search uses the Solr caches as they should. The adjacent collapsing algorithm *does* use the solr caches, but the algorithm is much slower than non adjacent collapsing.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556732#action_12556732 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

Charles - try applying Doug Steigerwald's latest patch:   field_collapsing_dsteigerwald.diff 

I have not tested it, but it does apply without errors

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599714#action_12599714 ] 

Bojan Smid commented on SOLR-236:
---------------------------------

Hi Oleg. I'll look into this also. In case you have any working code, you can mail it to me, and I'll see what can be reused.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716433#action_12716433 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

Strange, maybe something went wrong during building and CollapseComponent is not included into the war. You might look into solr.war and check for CollapseComponent.class:

{noformat}
cd apache-solr-1.3.0/example/webapps
unzip solr.war
cd WEB-INF/lib
unzip apache-solr-core-1.3.0.jar
cd org/apache/solr/handler/component/
{noformat}
Is the file CollapseComponent.class there?


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679618#action_12679618 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

Unfortunately I don't think that will work for us.  The collapse.maxdocs seems to collapse the oldest documents in the index - but we sort from newest to oldest, so effectively the newest documents in the index are just left out.  Not only do they not collapse but they don't appear at all.  If this is the only solution then we will have to stop using the patch... and unfortunately this means in general we will probably have to stop using Solr.  The company has already made clear that this functionality is required, and especially since it has been working now for several months they will be very unlikely to accept that they can't have it anymore.

Anyway I don't want to give up yet...

I'm really not convinced this is really a problem of running out of the necessary memory to complete the operation - it only started doing this very recently.  How does it run for 3 months with 2GB of RAM without any trouble, and now it fails even with 3GB of RAM?  It's not like we just added those 200000 documents yesterday - they have accumulated over the past few months, in the past 3 days we've only perhaps added 20,000 documents.  20,000 more documents (with barely any new search terms at all) means it needs more than 1GB of memory more than what it was already using?  If we grow by 25% every year that means by December we will need 50GB of RAM in the machine.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719360#action_12719360 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Shekar, can you show how you configured local solr and field collapsing in the solrconfig.xml file?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563791#action_12563791 ] 

Charles Hornberger commented on SOLR-236:
-----------------------------------------

It seems like SearchHandler was simply moved down into the org.apache.solr.handler.components package as part of r610426 - http://svn.apache.org/viewvc?view=rev&revision=610426

You should be able to modify the import statements field_collapsing_dsteigerwald.diff to make it work, no?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503185 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

We facet on the complete set of documents matching a query, even when the user only requests the top 10 matches.  It seems we should do the same here.  The set of documents is the same, the only difference is what "top" documents are returned.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar reassigned SOLR-236:
------------------------------------------

    Assignee: Shalin Shekhar Mangar

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Patrick Jungermann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797716#action_12797716 ] 

Patrick Jungermann commented on SOLR-236:
-----------------------------------------

Hi all,

we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. Within the index, there are ~3.5 million documents with string-based identifiers of a length up to 50 chars.

The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option {{collapse.maxdocs=150}} and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset.


Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with {{<searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent"/>}}.
Without setting the option {{collapse.field}}, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled).


Additionally it might be very useful, if the parameter {{collapse=true|false}} would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...).


Patrick

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "JList (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607010#action_12607010 ] 

JList commented on SOLR-236:
----------------------------

Sorry about the dup. I obviously didn't check the comments before I posted the bug. Anyway, it's still there, it's still happening :)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792189#action_12792189 ] 

Uri Boness commented on SOLR-236:
---------------------------------

{quote}
Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later.
{quote}

I think managing a separate branch will be just as hard as managing a patch. I do however agree that it's about time this patch will be committed to the trunk. Even though the current solution is not scalable in terms of distributed search (and I agree that the current solution for that is not really a viable solution), many are already using it and it is the most wanted feature in JIRA after all. One think you can do, is apply the changed to the core (which are not really many) and commit the rest of the patch as a contrib (along with all the disclaimers Shalin mentioned above). 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565019#action_12565019 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

I haven't been following this, so I don't know why there is a need for a NegatedDocSet (or if introducing it is the best solution), but it looks like you have two cases to handle: one negative set or two negative sets.
If you have a and -b, then return a.andNot(b)
if both a and b are negative (-a.intersection(-b))  then return NegatedDocSet(a.union(b))  // per De Morgan, -a&-b == -(a|b)

That's only for intersection() of course.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Nuno Leitao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516119 ] 

Nuno Leitao commented on SOLR-236:
----------------------------------

It would be nice for this patch to also report on what documents were actually *collapsed* - for example, if the result list contained:

doc1
doc2
doc3

and doc2 and doc3 were collapsed, this would be reflected in the XML result as, so that one could determine that (forgive my crap visual representation):

doc1
 -> doc2
 -> doc3

Regards.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Dave Redford (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694851#action_12694851 ] 

Dave Redford edited comment on SOLR-236 at 4/10/09 6:47 PM:
------------------------------------------------------------

There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results are present - which I was using for testing]

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...thanks to all


      was (Author: dredford):
    There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

[Update: this is only an issue when both standard results and collapse results are present - which I was using for testing]

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Aytek Ekici (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765605#action_12765605 ] 

Aytek Ekici commented on SOLR-236:
----------------------------------

Hi Martijn,
Intersection of results sets is also a kind of "AND", right? Intersection result of A docset and B docset is equal to resultset of "conA AND condB" i think.

Your suggestion "fq=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]" works. And also Anil's suggestion "fq=+lat:[37.2 TO 39.8] +lng:[24.5 TO 29.9]" works. 
But they don't allow multiple selections for a facet field. I can't use excludes. It throws parsing errors. 
Using "AND" between two filters in a filter query results with one item in FilterList of QueryCommand, that must be the reason not to be able to parse/support ex/tag things there i guess.

I have two solr instances here one with patch and another without patch. And i just copied configurations and data from one to other. Only difference is field_collapsing patch as i can see. I'm trying to see what makes the difference in results but new in solr so it takes time to see/catch what is going on. So any help/tip would be appreciated.

Thanks,
Aytek

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ron Veenstra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716527#action_12716527 ] 

Ron Veenstra commented on SOLR-236:
-----------------------------------

Quick update :  starting fresh, i was able to get the issue resolved once ant properly rebuilt the solr-core file. Uncertain why previous attempts failed so completely.  Many thanks for your help.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792350#action_12792350 ] 

Shalin Shekhar Mangar commented on SOLR-236:
--------------------------------------------

For Martijn:

{quote}
The reason I added <fieldCollapsing> ... </fieldCollapsing> was to be able support sharing of collapseCollectorFactory instances between different collapse components in the near future. You think that is a valid reason for that? Or do you think that collapseCollectorFactories shouldn't be shared?
{quote}

I just don't think that we should introduce new tags and new kinds of components in solrconfig.xml, particularly those that are useful to only a single component. That introduces changes in SolrConfig.java so that it knows how to load such things. That is why I moved that configuration inside CollapseComponent. Ideally, all components will use PluginInfo and load whatever they need from their own PluginInfo object and SolrConfig would not need to be changed unless we introduce new kinds of Solr plugins.

Just curious, what would be a use-case for sharing factories (other than reducing duplication of configuration) and having multiple CollapseComponent?

{quote}
The CollapseComponentTest was failing. The field collapseCollectorFactories in CollapseComponent was null when not specifying any collapse collector factories in the solrconfig.xml which resulted in a NPE.
{quote}

Oops, sorry about that. I only ran the tests inside org.apache.solr.search.fieldcollapse. I didn't notice there are other tests too. Thanks!

bq. The DistributedFieldCollapsingIntegrationTest is still failing, because you left out changes in JettySolrRunner, CoreContainer and SolrDispatchFilter from my original patch.

I don't think we need to add that functionality to CoreContainer and SolrDispatchFilter. It is still possible to specify a different solrconfig and schema for a test. Let me see if I can make this work with BaseDistributedSearchTestCase

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791952#action_12791952 ] 

Noble Paul commented on SOLR-236:
---------------------------------

shalin, the names may not be necessary on the collapseCollectorFactory  becaus they are never referred by the name



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557505#action_12557505 ] 

Charles Hornberger commented on SOLR-236:
-----------------------------------------

Doug -- I just started looking into field collapsing the other day, but from glancing at the code in QueryComponent.java and CollapseComponent.java, it seems like perhaps you're not supposed to be using both components -- after all, their prepare() methods are identical, and their process() methods both execute the user's search and shove the resulting DocList into the "response" entry of the response object's internal storage Map. (The QueryComponent additionally stores the DocListAndSet in the ResponseBuilder object via builder.setResults() -- I'm not sure why this is -- and prefetches documents if the result set is small enough.) My guess is that if you want to enable collapsing, you should use the CollapseComponent; if you want to disable it, use the QueryComponent. Maybe someone who understand the design of the search handling components better than me can confirm this or correct my misunderstanding(s) ...

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Darrell Silver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750254#action_12750254 ] 

Darrell Silver commented on SOLR-236:
-------------------------------------

Ha, so it is!  Thanks for the note; I'd totally missed that.

Returning only select fields of the collapsed documents would be a good option for us.  Also, In our subquery of the collapsed documents we're finding the first and last result (they're time sorted so this makes sense).  I guess this is similar to Thomas' average problem, but for us it's not necessary to iterate over the entire subquery results.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have updated the field collapse patch and made the following changes:
# Refactored the collapse code into a strategy pattern.  The two distinct manners of collapsing are now in two different classes, which in my understanding makes the code cleaner and easier to understand. I have removed the {{CollapseFilter}} and created a {{DocumentCollapser}} which is an interface. The {{DocumentCollapser}} has two concrete implementation the {{AdjacentDocumentCollapser}} and the {{NonAdjacentDocumentCollapser}}. Both implementation share the same abstract base class {{AbstractDocumentCollapser}} that has fields and methods that are common in both concrete implementation.
# Removed deprecated Lucene methods in the {{PredefinedScorer}}.
# Fixed a normal field collapse bug. Filter queries were handled as normal queries (were added together via a boolean query), and thus were also used for scoring.
# Added more unit and integration tests, including two tests that tests facets in combination with field collapsing. These tests test the collapse before collapsing and after collapsing.

This patch only works with the Solr 1.4-dev from revision 804700 and later.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Domingo Gómez García (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701862#action_12701862 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 4:23 AM:
--------------------------------------------------------------------

I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. I have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen exceptions. Specially in calls from solrj.

      was (Author: dgomezca):
    I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. After the task "generate-maven-artifacts" I use the resulting distribution and made  http://localhost:8983/solr/select/?q=*:*&collapse.field=cat&collapse.max=1&collapse.type=normal (from wiki). No collapsed results. It seems to be ignoring CollapseComponent or something like that.
Do I have to configure something else?
Could anyone bring to me a working version/patch?

Thank you.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566864#action_12566864 ] 

oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:15 PM:
--------------------------------------------------------------

Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed?

As a result I see:
<code>
<lst name="collapse_counts">
    <int name="Restaurant">2414</int>
    <int name="Bar/Club">9</int>
    <int name="Directory & Services">37</int>
</lst>
</code>

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory & Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!

      was (Author: oleg_gnatovskiy):
    Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed?

As a result I see:
<pre>
<lst name="collapse_counts">
    <int name="Restaurant">2414</int>
    <int name="Bar/Club">9</int>
    <int name="Directory & Services">37</int>
</lst>
</pre>

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory & Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501578 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

Adjacent collapsing is useful because it preserves the pertinence of the sort.
The sorting is not modified. I copy the current sort to do a new search.

I am currently working on taking care of type field (int).

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Domingo Gómez García (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701862#action_12701862 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 6:46 AM:
--------------------------------------------------------------------

I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch.
Is there any way of integrate with solrj?

      was (Author: dgomezca):
    I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. When I use collapse parameters  I always get permgen exceptions. How much memory could use collapse vs normal querys?
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-236:
-------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

No real changes.  Updated to apply with trunk.
Moved the valid values for CollapseType to a 'common' package

- - - -

as a side note, when you make a patch, its easiest to deal with if the path is relative to the solr root directory.

src/java/org/apache/solr/search/SolrIndexSearcher.java
 is better then:
/Users/ekeller/Documents/workspace/solr/src/java/org/apache/solr/search/SolrIndexSearcher.java

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Traeger updated SOLR-236:
--------------------------------

    Attachment: SOLR-236_collapsing.patch

This patch is based on the latest patch by Dmitry, it addresses the following issues:
  * the CollapseComponent now simply falls back to the process method of QueryComponent when no collapse.field is defined. This fixes issues with the fq param when collapsing was disabled and makes CollapseComponent a fully compatible replacement for QueryComponent.
 * collapse.facet=before is now fixed, the previous patch ignored any filter queries (fq) and therefore returned wrong facet counts
 * ResponseBuilder "builder" renamed to "rb" to match QueryComponent

This patch applies to trunk (rev. 772433) but works with Solr 1.3 too. For 1.3 you have to move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ as the location of this file has been changed in trunk.

This is my first contribution so any feedback is much appreciated. This is a great feature so lets get it into Solr as soon as possible.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754764#action_12754764 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

I found the problem with my real world data and reproduced it with the solr example schema and data. In the solr example popularity is of type "int" and inStock is "boolean". I made some more tests and could reproduce other fieldtypes too, here some examples using the field manu_exact (string):

[http://localhost:8983/solr/select/?q=*:*&sort=manu_exact%20asc&fl=id&collapse.field=inStock&collapse.includeCollapsedDocs=true]
-> as in the previous example document id:VDBDB1A16 is in result and collapsedDocs

[http://localhost:8983/solr/select/?q=*:*&sort=manu_exact%20desc&fl=id&collapse.field=inStock&collapse.includeCollapsedDocs=true]
-> document id:VA902B is in result and collapsedDocs

[http://localhost:8983/solr/select/?q=*:*&sort=popularity%20desc&fl=id&collapse.field=manu_exact&collapse.includeCollapsedDocs=true]
-> document id:VS1GB400C3 is in result and collapsedDocs



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitry Lihachev updated SOLR-236:
---------------------------------

    Attachment: SOLR-236_collapsing.patch

This patch (based on dieter patch) allows using fq parameter

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Thomas Woodard (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777659#action_12777659 ] 

Thomas Woodard edited comment on SOLR-236 at 11/13/09 9:10 PM:
---------------------------------------------------------------

I tried the build again, and you are right, it does work fine with the default search handler. I had been trying to get it working with our search handler, which is dismax. That still doesn't work. Here is the handler configuration, which works fine until collapsing is added.

{code:xml}
<requestHandler name="glsearch" class="solr.SearchHandler">
	<lst name="defaults">
		<str name="defType">dismax</str>
		<str name="qf">name^3 description^2 long_description^2 search_stars^1 search_directors^1 product_id^0.1</str>
		<str name="tie">0.1</str>
		<str name="facet">true</str>
		<str name="facet.field">stars</str>
		<str name="facet.field">directors</str>
		<str name="facet.field">keywords</str>
		<str name="facet.field">studio</str>
		<str name="facet.mincount">1</str>
	</lst>
</requestHandler>
{code}

Edit: The search fails even if you don't pass a collapse field.

      was (Author: gtfoomw):
    I tried the build again, and you are right, it does work fine with the default search handler. I had been trying to get it working with our search handler, which is dismax. That still doesn't work. Here is the handler configuration, which works fine until collapsing is added.

{code:xml}
<requestHandler name="glsearch" class="solr.SearchHandler">
	<lst name="defaults">
		<str name="defType">dismax</str>
		<str name="qf">name^3 description^2 long_description^2 search_stars^1 search_directors^1 product_id^0.1</str>
		<str name="tie">0.1</str>
		<str name="facet">true</str>
		<str name="facet.field">stars</str>
		<str name="facet.field">directors</str>
		<str name="facet.field">keywords</str>
		<str name="facet.field">studio</str>
		<str name="facet.mincount">1</str>
	</lst>
</requestHandler>
{code}
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing_1.1.0.patch

I still maintain a version for the release 1.1.0 (The version we used on our production environment).

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 4 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749631#action_12749631 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

I use collapsing in an online store and need to do a quite complex price calculation for every collapse group based on the products behind that group. I also thought about doing a second query, but that is not an option as I would have to do that for every group (i have up to 100 groups per request). So doing the calculation outside the scope of solr but retrieving the necessary data from solr seems to be the best approach for me. I agree that this functionality should be disabled by default.

Thanks for the pointer, I will have a look at it...

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Description: 
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 4 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.type" normal (default value) or adjacent
"collapse.max" to select how many continuous results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases


  was:
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 3 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.max" to select how many continuous results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 4 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679603#action_12679603 ] 

jove4015 edited comment on SOLR-236 at 3/6/09 6:13 AM:
------------------------------------------------------------

Help!!

We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly.

[Edit - It's Ivan's latest patch, #3, with Solr 1.3 dist]

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
	at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
	at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
	at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
	at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
	at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
	at org.apache.solr.search.CollapseFilter.<init>(CollapseFilter.java:139)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


It seems to happen randomly - there's no special request happening, nothing new added to the index, nothing.  We've made no configuration changes. The only thing that's happened is more documents have been added since then.  The schema is the same, we have perhaps 200000 more documents in the index now than we did when we first went live with it.

It was a 32-bit machine allocated 2GB of RAM for Java before.  We just upgraded it to 64-bit and increased the heap space to 3GB, and still it went down last night.  I'm at my wits end, I don't know what to do but this functionality has been live so long now it's going to be extremely painful to take it away.  Someone, please tell me if there's anything I can do to save this thing.

      was (Author: jove4015):
    Help!!

We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly.

[Edit - It's Ivan's latest patch, #3]

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
	at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
	at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
	at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
	at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
	at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
	at org.apache.solr.search.CollapseFilter.<init>(CollapseFilter.java:139)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


It seems to happen randomly - there's no special request happening, nothing new added to the index, nothing.  We've made no configuration changes. The only thing that's happened is more documents have been added since then.  The schema is the same, we have perhaps 200000 more documents in the index now than we did when we first went live with it.

It was a 32-bit machine allocated 2GB of RAM for Java before.  We just upgraded it to 64-bit and increased the heap space to 3GB, and still it went down last night.  I'm at my wits end, I don't know what to do but this functionality has been live so long now it's going to be extremely painful to take it away.  Someone, please tell me if there's anything I can do to save this thing.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599660#action_12599660 ] 

Bojan Smid commented on SOLR-236:
---------------------------------

I will try to bring this patch up to date. Currently I see two main problems:

1) The patch applies to trunk, but it doesn't compile. The problem occurs mainly because of changes in Search Components (for instance, some method signatures which CollapseComponent implements were changed). I have this fixed locally (more or less), but I have to test it before posting new version of patch.

2) It seems that CollapseComponent can't be used in chain with QueryComponent, but instead of it. CollapseComponent basically copies QueryComponent querying logic and adds some of it's own. I guess this isn't the right way to go. CollapseComponent should contain only collapsing logic and should be chainable with other components. Can anyone confirm if I'm right here? Of course, there might be some fundamental reason why CollapseComponent had to be implemented this way.

Does anyone else see any other issues with this component?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778147#action_12778147 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

What kind of exception is occurring if you use dismax (with and without field collapsing)? If I do a collapse search with dismax in the example setup (http://localhost:8983/solr/select/?q=power&collapse.field=inStock&qt=dismax) field collapsing appears to be working. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

@Marc. This was a silly bug, that occurs when you do not define a field collapse cache in the solrconfig.xml. I have attached a patch that fixes this bug, so you can use field collapse without configuring a field collapse cache. Caching with field collapsing is an optional feature.

@Chad. Due to changes in the trunk applying the previous patch will result into merge conflicts. The new patch can be applied without merge conflicts. This means that applying this patch on 1.4 source will properly result in merge conflicts. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792509#action_12792509 ] 

Noble Paul commented on SOLR-236:
---------------------------------

bq.This patch has quite a resource/performance hit. I've seen and read about the resource hit. Its rather large.

The performance price is paid only if you use this component.  Having the functionality itself in Solr is quite important. Performance can obviously be improved. (Faceting fot a 50 times perf boost in 1.4) . As long as the performance of the component is within the acceptable range we should leave that call to the user.  The cost actually depends on the data set too.

As long as the component has a correct public API (req params/response format/configuration) I believe it can be committed with a clear warning.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-236:
----------------------------------

    Comment: was deleted

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Aytek Ekici (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765076#action_12765076 ] 

Aytek Ekici edited comment on SOLR-236 at 10/13/09 6:48 AM:
------------------------------------------------------------

Hi all,
Just applied "field-collapse-5.patch" and i guess there are problems with filter queries.

Here it is:

1- select?q=*:*&fq=lat:[37.2 TO 39.8]
numFound: 6284

2- select?q=*:*&fq=lng:[24.5 TO 29.9]
numFound: 16912

3- select?q=*:*&fq=lat:[37.2 TO 39.8]&fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using "q" instead of "fq" which is: 
select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying "AND" for each filter query it applies "OR". Checked select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.

      was (Author: aytek):
    Hi all,
Just applied "field-collapse-5.patch" and i guess there are problems with filter queries.

Here it is:

1- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]
numFound: 6284

2- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lng:[24.5 TO 29.9]
numFound: 16912

3- http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]&fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using "q" instead of "fq" which is http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying "AND" for each filter query it applies "OR". Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495367 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

Thanks for looking into this Emmanuel.
It appears as if this only collapses adjacent documents, correct?

We should really try to get everyone on the same page... hash out the exact semantics of "collapsing", and the most useful interface.  An efficient implementation  can follow.

A good starting point might be here:

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Karsten Sperling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539507 ] 

Karsten Sperling commented on SOLR-236:
---------------------------------------

I've just looked at the implementation of this patch again -- it ends up calling SolrIndexSearcher.getDocListC() with a DocSet derived from the CollapseFilter as the 'filter' parameter. The comment on that method says that only filter or filterList should be provided, but not both. However with the field collapsing patch both WILL be provided if filter queries are passed to the dismax request handler by the client. Can anybody shed any light on what the implications of this are?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714742#action_12714742 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Hey guys, are there any plans to make field collapsing work on multi shard systems?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714750#action_12714750 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

I'm looking forward in your experiences with this patch, particular in production. 

I think in order to make collapsing work on multi shard systems the process method of the CollapseComponent needs to be modified.
CollapseComponent already subclasses QueryComponent (which already supports querying on multi shard systems), so it should not be that difficult.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574745#action_12574745 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Are there any plans to add collapse controls to SolrJ?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Doug Steigerwald (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556032#action_12556032 ] 

dsteigerwald edited comment on SOLR-236 at 1/4/08 11:43 AM:
----------------------------------------------------------------

I've created a CollapseComponent for field collapsing.  Everything seems to work fine with it.  Only issue I'm having is I cannot use the query component because when it isn't commented out, the non-field collapsed results are displayed and I can't figure out how to remove them.  Someone might be able to figure that part out.

[http://localhost:8983/solr/search?q=id:[0%20TO%20*]&collapse=true&collapse.field=inStock&collapse.type=normal&collapse.threshold=0]

Here's the config I'm using:

    <searchComponent name="collapse"     class="org.apache.solr.handler.component.CollapseComponent" /> 
    <requestHandler name="/search" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
        </lst>
        <arr name="components">
            <!--       <str>query</str> -->
            <str>facet</str>
            <!--       <str>mlt</str> -->
            <!--       <str>highlight</str> -->
            <!--       <str>debug</str> -->
            <str>collapse</str>
        </arr>
  </requestHandler>

      was (Author: dsteigerwald):
    I've created a CollapseComponent for field collapsing.  Everything seems to work fine with it.  Only issue I'm having is I cannot use the query component because when it isn't commented out, the non-field collapsed results are displayed and I can't figure out how to remove them.  Someone might be able to figure that part out.

http://localhost:8983/solr/search?q=id:[0%20TO%20*]&collapse=true&collapse.field=inStock&collapse.type=normal&collapse.threshold=0

Here's the config I'm using:

    <searchComponent name="collapse"     class="org.apache.solr.handler.component.CollapseComponent" /> 
    <requestHandler name="/search" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
        </lst>
        <arr name="components">
            <!--       <str>query</str> -->
            <str>facet</str>
            <!--       <str>mlt</str> -->
            <!--       <str>highlight</str> -->
            <!--       <str>debug</str> -->
            <str>collapse</str>
        </arr>
  </requestHandler>
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: SOLR-236-trunk.patch

@Thomas
Somehow the solrj code was left out the when I created the patch yesterday. I guess, I accidentally deleted it when I was moving the code the new trunk. Anyhow I have updated the patch that includes the solrj code and applying it should go flawless.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

Hi Aytek,

I was able to reproduce the same situation you described earlier. When I was testing yesterday I thought I was testing on a Solr instance without the patch, but I wasn't. Anyhow I have fixed bug and I have attached a new patch. Good thing you noticed this bug it was really corrupting the search results.

Martijn

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793565#action_12793565 ] 

Uri Boness commented on SOLR-236:
---------------------------------

@Yonik

As far as I understand from your collapse algorithm proposal, in order to save memory you'd like to restrict the group creation to only those that belong in the requested results page. Beyond loosing the faceting support over the collapsed DocSet, I think there might be a problem with pagination as well. For every page you'll end up with a different total count and therefore different number of pages. This can be very confusing from the user perspective - imagine going to the first page and calculating (and displaying) that you have 3 pages of results, then when the user asks for the second page, s/he gets a response with 2 pages and different total count. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing_1.3.patch

Here is the patch for solr 1.3 rev 589395.

I made some performance improvment. No more cache. We are using bitdocset or hashdocset using solrconfig.hashdocsetmaxsize variable.

Regards,
Emmanuel Keller.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828039#action_12828039 ] 

Koji Sekiguchi commented on SOLR-236:
-------------------------------------

A random comment, don't we need to check collapse.field is indexed in checkCollapseField()?

{code}
protected void checkCollapseField(IndexSchema schema) {
  SchemaField schemaField = schema.getFieldOrNull(collapseField);
  if (schemaField == null) {
    throw new RuntimeException("Could not collapse, because collapse field does not exist in the schema.");
  }

  if (schemaField.multiValued()) {
    throw new RuntimeException("Could not collapse, because collapse field is multivalued");
  }

  if (schemaField.getType().isTokenized()) {
    throw new RuntimeException("Could not collapse, because collapse field is tokenized");
  }
}
{code}

I accidentally specified an unindexed field for collapse.field, I got unexpected result without any errors.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Iván de Prado (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Iván de Prado updated SOLR-236:
-------------------------------

    Attachment: collapsing-patch-to-1.3.0-ivan_2.patch

A new patch with problems solved in my first submitted patch. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have updated the field collapse patch and improved the response format. Check [my blog|http://blog.jteam.nl/2009/11/11/improved-field-collapse-response/] for more details.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654979#action_12654979 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

What is the "localsolr" field you are talking about?

Is it the solr stuff from http://sourceforge.net/projects/locallucene ?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658609#action_12658609 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

I see there is a patch agains 1.3, is there any current patch against trunk?  (we would need something against trunk in order to consider this for 1.4)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Nikolai Kordulla (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572660#action_12572660 ] 

Nikolai Kordulla commented on SOLR-236:
---------------------------------------

A good thing were to apply this CollapseComponent for the mlt results.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Hornberger updated SOLR-236:
------------------------------------

    Attachment: field_collapsing_dsteigerwald.diff

Attaching a new copy of Doug Steigerwald's patch that omits the System.out.println() call in CollapseComponent.java.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792997#action_12792997 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

ttdi,
The latest patch is not in sync with the latest trunk. You can try to patch to the trunk or use a previous patch for the 1.4 code.

Yonik,
The parameters description is a bit poor. The response format of the older patches contains two separate lists of collapse group counts. A list with counts per most relevant document id that is enabled or disabled with collapse.info.doc param. The second list with counts per fieldvalue of the most relevant document that is controlled with collapse.info.count  param. Now that the response format has changed we should rename it to something more descriptive. Maybe something like collapse.showCount that adds the collapse count to the collapse group in the response (default to true) and collapse.showFieldValue that adds the fieldvalue of the most relevant document to the group (defaults to false)?

The collapse.maxdocs specifies when to abort field-collapsing after n document have been processed. I have never used is. I can imagine that one would use it to shorten the search time. 

The collapse.includeCollapsedDocs.fl enables a collapse collector that collects the documents that have been discarded and output the specified fields of the discarded documents to the fieldcollapse response per collapse group (* for all fields). The parameter name does not reflect that behaviour entirely. You think that collapse.collectDiscardedDocuments.fl is better? However personally I would not use this, because of the negative impact it has on performance. Usually one wants to know something like the average / highest / lowest price of a collapse group. The AggregateCollapseCollector would fit the needs better.

bq. Should I be able to specify a completely different sort within a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately).

I'm not sure about that. It would make things more complicated. Sorting the discarded documents in combination with the collapse.includeCollapsedDocs.fl functionality would maybe make more sense. 

bq. The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like?

I think the latter would make more sense, because field-collapsing does change the search result. It would just make it more obvious.

bq. Is there a way to specify the number of groups that I want back instead of the number of documents?
No there is not, but if the list of documents is replaced with a list of groups then the rows parameter should be used to indicate the number of groups to be displayed instead the number of documents to be displayed.

Just one thought I had about the algorithm you propose. If you only create collapse groups for the top ten documents then what about the total count of the search? Unique documents outside the top ten documents are not being grouped (if I understand you correctly) and that would impact the total count with how it currency works.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Kevin Cunningham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830305#action_12830305 ] 

Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM:
----------------------------------------------------------------

Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory.  Were there any confirmed issues that may have been addressed with the later patches?  We're using the 12-24 patch.  Any toggles we can switch to still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption.

What fixes would we be missing if ran Solr 1.4 with the last "field-collapse-5.patch" patch?

      was (Author: kunningham):
    Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory.  Were there any confirmed issues that may have been addressed with the later patches?  We're using the 12-24 patch.  Any toggles we can switch to still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638359#action_12638359 ] 

Mark Miller commented on SOLR-236:
----------------------------------

bq. What's a hard drive sort? 

Sorry - was not very clear.

Just like sorting, finding dupes can be done in memory or using external storage (harddrive). I am only just looking into this stuff myself, but it seems in the best case you would want to do it in memory with a hash system which can be linear scalability. If you have too many items to look for dupes in, you have to use external storage - one good method is two external sorts (we get one from the search), but there are other options too I think.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-solr-236-2.patch

Thanks for the feedback, I fixed the problem you described and I have added a new patch containing the fix.
The problem occurred when sorting was done on one ore more normal fields and on scoring. 



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538339 ] 

ekeller edited comment on SOLR-236 at 10/28/07 1:55 PM:
----------------------------------------------------------------

Here is the patch for solr 1.3 rev 589395.

I made some performance improvement. No more cache. I use bitdocset or hashdocset depending on solrconfig.hashdocsetmaxsize variable.

Regards,
Emmanuel Keller.

      was (Author: ekeller):
    Here is the patch for solr 1.3 rev 589395.

I made some performance improvment. No more cache. We are using bitdocset or hashdocset using solrconfig.hashdocsetmaxsize variable.

Regards,
Emmanuel Keller.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656716#action_12656716 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

I get an error on certain searches with Ivan's latest patch.

Dec 15, 2008 2:32:00 PM org.apache.solr.core.SolrCore execute
INFO: [ss_image_core] webapp=/solr path=/select params={collapse=true&facet.limit=5&wt=json&rows=50&json.nl=map&start=0&sort=add_date+desc,+object_id+asc&facet=true&collapse.facet=after&f.season.facet.limit=-1&facet.mincount=1&fl=object_id&q=object_type:image+AND+classif_name:(19097)+AND+market:(49154)+AND+perms:(1835+OR+4785+OR+1725+OR+1690+OR+2816+OR+3149+OR+3082+OR+2815+OR+2814+OR+3083+OR+4783)&version=1.2&f.classif_name.facet.limit=-1&collapse.field=link_id&collapse.threshold=1&facet.field=classif_name&facet.field=market&facet.field=season&facet.field=city&facet.field=designer&facet.field=category&facet.field=keywords&facet.field=lifestyle} hits=263059 status=500 QTime=4508 
Dec 15, 2008 2:32:00 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 41386
	at org.apache.solr.util.OpenBitSet.fastSet(OpenBitSet.java:235)
	at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:214)
	at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
	at org.apache.solr.search.CollapseFilter.<init>(CollapseFilter.java:139)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


Unfortunate really, it happens every time this specific search is run, but many, many other searches of similar result set size and considerably more complexity or equivalent complexity will execute fine... I can't honestly tell you what's special about this one search that would make it fail.

For now the patch is offline until we can figure something out for it...  I can provide access to the machine (I managed to reproduce it in a test environment)  if it would help determine what the problem is / make the software better for everyone.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794252#action_12794252 ] 

Uri Boness commented on SOLR-236:
---------------------------------

{quote}If we are returning a number of documents (as opposed to a number of groups) to the user, how do they avoid splitting on a page in the middle of the group?{quote}

As far as I know (Martijn, correct me if I'm wrong), Martijn's patch returns the number of groups *and* documents, where each group is actually represented as a document. So in that sense, the total count applies to the result set as is (groups count as documents) and therefore pagination just works. 

{quote}The only thing this algorithm can't do (related to pagination) is give the total number of documents after collapsing (and hence can't calculate the exact number of pages). This can be fine in many circumstances as long as the gui handles it (people don't seem to mind google doing it... I just tried it. Google didn't show the result count right unless displaying the last page).{quote}

First of all, I must admit that I never noticed that in Google, so I guess you're right :-). But when you think about it, with Google, how many time do you get a low hit count that only fits in 2-3 pages? Well, I hardly ever get it, and when I do I don't even bother to check the result I just try to improve my search. With Solr, a lot of times its different, specially when all these discovery features and faceting are so often used to narrow the search extensively... I'm not saying not having a perfect pagination mechanism is a problem... not at all, I'm just saying that it *might* be an issue for specific use cases or specific domains.... but that's just an assumption (or a gut feeling) :-)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yaniv S. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802334#action_12802334 ] 

Yaniv S. commented on SOLR-236:
-------------------------------

Hi All, this is a very exciting feature and I'm trying to apply it on our system.
I've tried patching on 1.4 and on the trunk version but both give me build errors.
Any suggestions on how I can build 1.4 or latest with this patch?

Many Thanks,
Yaniv

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792446#action_12792446 ] 

Grant Ingersoll commented on SOLR-236:
--------------------------------------

I'm curious as to whether anyone has just thought of using the Clustering component for this?  If your "collapse" field was a single token, I wonder if you would get the results you're looking for.  

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792477#action_12792477 ] 

Mark Miller commented on SOLR-236:
----------------------------------

I'm with Grant on this one. Trunk is not a sandbox, and getting more developer attention is not a good reason to put something in trunk. Issues should go in when they are ready.

Tons of interest and votes doesn't mean rush to trunk - if that type of thing moves you, it means start putting some work into it to make it ready for trunk.

This patch has quite a resource/performance hit. I've seen and read about the resource hit. Its rather large. The performance hit is not any better. The linked to blog marks performance with collapsing as 5-10 times slower than without.

Personally, I don't think this issue is ready for trunk.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769878#action_12769878 ] 

Martijn van Groningen edited comment on SOLR-236 at 10/27/09 4:34 PM:
----------------------------------------------------------------------

I have attached a new patch which includes a major refactoring which makes the code more flexible and cleaner. The patch also includes a new aggregate functionality and a bug fix.

h3. Aggregate function and bug fix
The new patch allows you to execute aggregate functions on the collapsed documents (for example sum the stock amount or calculating the minimum price of a collapsed group). Currently there are four aggregate functions available: sum(), min(), max() and avg(). To execute one or more functions the _collapse.aggregate_ parameter has to be added to the request url. The parameter expects the following syntax: _function_name(field_name)[, function_name(field_name)]_. For example: collapse.aggregate=sum(stock), min(price) and might have a result like this:
{code:xml}
<lst name="aggregatedResults">
   <lst name="sum(stock)">
      <str name="Amsterdam">10</str>
      ...
   </lst>
   <lst name="min(price)">
      <str name="Amsterdam">5.99</str>
      ...
   </lst>
</lst>
{code}

The patch also fixes a bug inside the {{NonAdjacentDocumentCollapser}} that was reported on the solr-user mailing list a few days ago. An index out of bounds exception was thrown when documents were removed from an index and a field collapse search was done afterwards.  

h3. Code refactoring
The code refactoring includes the following things:
* The notion of a {{CollapseGroup}}. A collapse group defines what an unique group is in the search result. For the adjacent and non adjacent document collapser this is different. For adjacent field collapsing a group is defined by its field value and the document id of the most relevant document in that group. More then one collapse group may have the same fieldvalue. For normal field collapsing (non adjacent) the group is defined just by the field value. 
* The notion of a {{CollapseCollector}} that receives the collapsed documents from a {{DocumentCollector}} and does something with it. For example keeps a count of how many documents were collapsed per collapse group or computes an average of a certain field like price. As you can see in the code instead of using field values or document ids a collapse group is used for identifying a collapse group.
{code}
/**
 * A <code>CollapseCollector</code> is responsible for receiving collapse callbacks from the <code>DocumentCollapser</code>.
 * An implementation can choose what to do with the received callbacks and data. Whatever an implementation collects it
 * is responsible for adding its results to the response.
 *
 * Implementation of this interface don't need to be thread safe!
 */
public interface CollapseCollector {

  /**
   * Informs the <code>CollapseCollector</code> that a document has been collapsed under the specified collapseGroup.
   *
   * @param docId The id of the document that has been collasped
   * @param collapseGroup The collapse group the docId has been collapsed under
   * @param collapseContext The collapse context
   */
  void documentCollapsed(int docId, CollapseGroup collapseGroup, CollapseContext collapseContext);

  /**
   * Informs the <code>CollapseCollector</code> about the document head.
   * The document head is the most relevant id for the specified collapseGroup.
   *
   * @param docHeadId The identifier of the document head
   * @param collapseGroup The collapse group of the document head
   * @param collapseContext The collapse context
   */
  void documentHead(int docHeadId, CollapseGroup collapseGroup, CollapseContext collapseContext);

  /**
   * Adds the <code>CollapseCollector</code> implementation specific result data to the result.
   *
   * @param result The response result 
   * @param docs The documents to be added to the response
   * @param collapseContext The collapse context
   */
  void getResult(NamedList result, DocList docs, CollapseContext collapseContext);

}
{code}
There is also a {{CollapseContext}} that allows you store data that can be shared between {{CollapseCollectors}}. 
* A {{CollapseCollectorFactory}} is responsible for creating a {{CollepseCollector}}. It does this based on the {{SolrQueryRequest}}. All the logic for when to enable a certain {{CollapseCollector}} must be placed in the factory. 
{code}
/**
 * A concrete <code>CollapseCollectorFactory</code> implementation is responsible for creating {@link CollapseCollector}
 * instances based on the {@link SolrQueryRequest}.
 */
public interface CollapseCollectorFactory {

  /**
   * Creates an instance of a CollapseCollector specified by the concrete subclass.
   * The concrete subclass decides based on the specified request if an new instance has to be created and
   * can return <code>null</code> for that matter.
   * 
   * @param request The specified request
   * @return an instance of a CollapseCollector or <code>null</code>
   */
  CollapseCollector createCollapseCollector(SolrQueryRequest request);

}
{code}
Currently there are four {{CollapseCollectorFactories}} implementations:
# {{DocumentGroupCountCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect the collapse counts per document group and return the counts in the response per collapsed group most relevant document id.
# {{FieldValueCountCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect the collapse count per collapsed group and return the counts in the response per collepsed group field value.
# {{DocumentFieldsCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect predefined fieldvalues from collapsed documents.
# {{AggregateCollapseCollectorFactory}} creates {{CollapseCollectors}} that create aggregate statistics based on the collapsed documents.
{{CollapseCollectorFactories}} are configured in the solrconfig.xml and by default all implementations in the patch are configured. The following configuration is sufficient 
{code:xml}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" />
{code}
The following configurations configures the same {{CollapseCollectorFactories}} as the previous configuration:
{code:xml}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent">
    <arr name="collapseCollectorFactories">
        <str>groupDocumentsCounts</str>
        <str>groupFieldValue</str>
        <str>groupDocumentsFields</str>
        <str>groupAggregatedData</str>
    </arr>
  </searchComponent>

  <fieldCollapsing>
    <collapseCollectorFactory name="groupDocumentsCounts" 
class="solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupFieldValue" class="solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupDocumentsFields" 
 class="solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupAggregatedData"
 class="org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory">
        <lst name="aggregateFunctions">
            <str name="sum">org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction</str>
            <str name="avg">org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction</str>
            <str name="min">org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction</str>
            <str name="max">org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction</str>
        </lst>
    </collapseCollectorFactory>
  </fieldCollapsing>
{code}
The {{CollapseCollectorFactories}} configured can be shared among different {{CollapseComponents}}. Most users do not have to do this, but when you creating your own implementations or someone else's then you have to do this in order to configure the {{CollapseCollectorFactory}} implementation. The order in collapseCollectorFactories does matter. {{CollapseCollectors}} may share data via the {{CollapseContext}} for that reason the order is depend. The {{CollapseCollectorFactories}} in the patch do not share data, but other implementations may.

The new patch contains a lot of changes, but I personally think that the patch is really an improvement especially the introduction of the {{CollapseCollectors}} that allows a lot of flexibility. Btw any feedback or questions are welcome.

      was (Author: martijn):
    I have attached a new patch which includes a major refactoring which makes the code more flexible and cleaner. The patch also includes a new aggregate functionality and a bug fix.

h3. Aggregate function and bug fix
The new patch allows you to execute aggregate functions on the collapsed documents (for example sum the stock amount or calculating the minimum price of a collapsed group). Currently there are four aggregate functions available: sum(), min(), max() and avg(). To execute one or more functions the _collapse.aggregate_ parameter has to be added to the request url. The parameter expects the following syntax: _function_name(field_name)[, function_name(field_name)]_. For example: collapse.aggregate=sum(stock), min(price) and might have a result like this:
{code:xml}
<lst name="aggregatedResults">
   <lst name="sum(stock)">
      <str name="Amsterdam">10</str>
      ...
   </lst>
   <lst name="min(price)">
      <str name="Amsterdam">5.99</str>
      ...
   </lst>
</lst>
{code}

The patch also fixes a bug inside the {{NonAdjacentDocumentCollapser}} that was reported on the solr-user mailing list a few days ago. An index out of bounds exception was thrown when documents were removed from an index and a field collapse search was done afterwards.  

h3. Code refactoring
The code refactoring includes the following things:
* The notion of a {{CollapseGroup}}. A collapse group defines what an unique group is in the search result. For the adjacent and non adjacent document collapser this is different. For adjacent field collapsing a group is defined by its field value and the document id of the most relevant document in that group. More then one collapse group may have the same fieldvalue. For normal field collapsing (non adjacent) the group is defined just by the field value. 
* The notion of a {{CollapseCollector}} that receives the collapsed documents from a {{DocumentCollector}} and does something with it. For example keeps a count of how many documents were collapsed per collapse group or computes an average of a certain field like price. As you can see in the code instead of using field values or document ids a collapse group is used for identifying a collapse group.
{code}
/**
 * A <code>CollapseCollector</code> is responsible for receiving collapse callbacks from the <code>DocumentCollapser</code>.
 * An implementation can choose what to do with the received callbacks and data. Whatever an implementation collects it
 * is responsible for adding its results to the response.
 *
 * Implementation of this interface don't need to be thread safe!
 */
public interface CollapseCollector {

  /**
   * Informs the <code>CollapseCollector</code> that a document has been collapsed under the specified collapseGroup.
   *
   * @param docId The id of the document that has been collasped
   * @param collapseGroup The collapse group the docId has been collapsed under
   * @param collapseContext The collapse context
   */
  void documentCollapsed(int docId, CollapseGroup collapseGroup, CollapseContext collapseContext);

  /**
   * Informs the <code>CollapseCollector</code> about the document head.
   * The document head is the most relevant id for the specified collapseGroup.
   *
   * @param docHeadId The identifier of the document head
   * @param collapseGroup The collapse group of the document head
   * @param collapseContext The collapse context
   */
  void documentHead(int docHeadId, CollapseGroup collapseGroup, CollapseContext collapseContext);

  /**
   * Adds the <code>CollapseCollector</code> implementation specific result data to the result.
   *
   * @param result The response result 
   * @param docs The documents to be added to the response
   * @param collapseContext The collapse context
   */
  void getResult(NamedList result, DocList docs, CollapseContext collapseContext);

}
{code}
There is also a {{CollapseContext}} that allows you store data that can be shared between {{CollapseCollectors}}. 
* A {{CollapseCollectorFactory}} is responsible for creating a {{CollepseCollector}}. It does this based on the {{SolrQueryRequest}}. All the logic for when to enable a certain {{CollapseCollector}} must be placed in the factory. 
{code}
/**
 * A concrete <code>CollapseCollectorFactory</code> implementation is responsible for creating {@link CollapseCollector}
 * instances based on the {@link SolrQueryRequest}.
 */
public interface CollapseCollectorFactory {

  /**
   * Creates an instance of a CollapseCollector specified by the concrete subclass.
   * The concrete subclass decides based on the specified request if an new instance has to be created and
   * can return <code>null</code> for that matter.
   * 
   * @param request The specified request
   * @return an instance of a CollapseCollector or <code>null</code>
   */
  CollapseCollector createCollapseCollector(SolrQueryRequest request);

}
{code}
Currently there are four {{CollapseCollectorFactories}} implementations:
# {{DocumentGroupCountCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect the collapse counts per document group and return the counts in the response per collapsed group most relevant document id.
# {{FieldValueCountCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect the collapse count per collapsed group and return the counts in the response per collepsed group field value.
# {{DocumentFieldsCollapseCollectorFactory}} creates {{CollapseCollectors}} that collect predefined fieldvalues from collapsed documents.
# {{AggregateCollapseCollectorFactory}} creates {{CollapseCollectors}} that create aggregate statistics based on the collapsed documents.
{{CollapseCollectorFactories}} are configured in the solrconfig.xml and by default all implementations in the patch are configured. The following configuration is sufficient 
{code:xml}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" />
{code}
The following configurations configures the same {{CollapseCollectorFactories}} as the previous configuration:
{code:xml}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent">
    <arr name="collapseCollectorFactories">
        <str>groupDocumentsCounts</str>
        <str>groupFieldValue</str>
        <str>groupDocumentsFields</str>
        <str>groupAggregatedData</str>
    </arr>
  </searchComponent>

  <fieldCollapsing>
    <collapseCollectorFactory name="groupDocumentsCounts" class="solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupFieldValue" class="solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupDocumentsFields" class="solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupAggregatedData" class="org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory">
        <lst name="aggregateFunctions">
            <str name="sum">org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction</str>
            <str name="avg">org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction</str>
            <str name="min">org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction</str>
            <str name="max">org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction</str>
        </lst>
    </collapseCollectorFactory>
  </fieldCollapsing>
{code}
The {{CollapseCollectorFactories}} configured can be shared among different {{CollapseComponents}}. Most users do not have to do this, but when you creating your own implementations or someone else's then you have to do this in order to configure the {{CollapseCollectorFactory}} implementation. The order in collapseCollectorFactories does matter. {{CollapseCollectors}} may share data via the {{CollapseContext}} for that reason the order is depend. The {{CollapseCollectorFactories}} in the patch do not share data, but other implementations may.

The new patch contains a lot of changes, but I personally think that the patch is really an improvement especially the introduction of the {{CollapseCollectors}} that allows a lot of flexibility. Btw any feedback or questions are welcome.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606986#action_12606986 ] 

Bojan Smid commented on SOLR-236:
---------------------------------

You can check discussion about this same problem in the posts above (starting with 1st Feb 2008). It seems like a rather complex issue which could require some serious refactoring of collapsing code.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment:     (was: field_collapsing.patch)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Doug Steigerwald (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654950#action_12654950 ] 

Doug Steigerwald commented on SOLR-236:
---------------------------------------

I'm having an issue with Ivan's latest patch.  I'm testing on a data set of 8113 documents.  All the documents have a string field called site.  There are only two sites, Site1 and Site2.

Site1 has 3466 documents.
Site2 has 4647 documents.

With the following simple query, I only get 1 result:
http://localhost:8983/solr/core1/search?q=*:*&collapase=true&collapse.field=site

....
<lst name="collapse_counts">
 <str name="field">site</str>
 <lst name="doc">
  <int name="site2-doc-2981790">4646</int>
 </lst>
 <lst name="count">
  <int name="Site2">4646</int>
 </lst>
 <str name="debug">HashDocSet(2) Time(ms): 0/0/0/0</str>
</lst>
<result name="response" numFound="1" start="0">
....

The only result displayed is for Site2.

I have an older patch working with Solr 1.3.0, but I can't get it to mesh with localsolr properly.  My localsolr gives 1656 results, and collapsed on the site it should give 2 results but gives 8 results, some of which are duplicate documents.  Without localsolr, my field collapsing patch seems to work fine.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-236:
---------------------------------------

    Attachment: SOLR-236.patch

# Patch updated for SOLR-1685 and SOLR-1686
# The last patch had reverted changes to CollapseComponent configuration in solrconfig.xml and solrconfig-fieldcollapse.xml. Synced it back

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Heigl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850934#action_12850934 ] 

Thomas Heigl commented on SOLR-236:
-----------------------------------

@Robert:

What is your use case for field collapsing? I think under "normal" conditions (collapsing on a field with reasonably many unique values) you can go with the slightly older patch and the OOM fixes. I compared the performance of the newest patch for the trunk with the 1.4 release patched as described above and didn't notice much difference under these conditions. I will must likely go with the trunk, however, as I have millions of documents with millions of unique values on the collapse field and need every bit of performance I can get.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704162#action_12704162 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

How did you fix the memory issue?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Michael Gundlach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Gundlach updated SOLR-236:
----------------------------------

    Attachment: quasidistributed.additional.patch

This patch does not apply field collapsing.

Apply this patch in addition to the latest field collapsing patch, to avoid an NPE when:

 - you are collapsing on a field F,
 - you are sharding into multiple cores, using the hash of field F as your sharding key, AND
 - you perform a distributed search on a tokenized field.

Note that if you attempt to use this patch to collapse on a field F1 and shard according to a field F2, you will get buggy search behavior.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835258#action_12835258 ] 

Peter Karich commented on SOLR-236:
-----------------------------------

Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query 

http://searchdev05:15100/cs-bidcs/select?q=*:*&collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775418#action_12775418 ] 

Shalin Shekhar Mangar commented on SOLR-236:
--------------------------------------------

I'm using Martijn's patch from 2009-10-27. The FieldCollapseResponse#parseDocumentIdCollapseCounts assumes the unique key is a long. Is that a bug or an undocumented limitation?

Nice work guys! We should definitely get this into Solr 1.5

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566864#action_12566864 ] 

oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:18 PM:
--------------------------------------------------------------

Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed?

As a result I see:

<lst name="collapse_counts">
    <int name="Restaurant">2414</int>
    <int name="Bar/Club">9</int>
    <int name="Directory & Services">37</int>
</lst>

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory & Services? If so, then that's great.

However when I collapse on some  fields I get an empty collapse_counts list. It could be that those fields have a large number of different values that it collapses on. Is there a limit to the number of values that collaose_counts displays?

Thanks in advance for any help you can provide!

      was (Author: oleg_gnatovskiy):
    Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed?

As a result I see:

<lst name="collapse_counts">
    <int name="Restaurant">2414</int>
    <int name="Bar/Club">9</int>
    <int name="Directory & Services">37</int>
</lst>

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory & Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792917#action_12792917 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

First, thanks to everyone who has spent so much time working on this - lack of committer attention doesn't equate to lack of interest... this is a very much needed feature!

I'd agree with Erik that the most important thing is the interface to the client, and making it well thought out and semantically "tight".  Martijn's recent improvements to the response structure is an example of improvements in this area.  It's also important to think about the interface in terms of how easy it will be to add further features, optimizations, and support distributed search.  If the code isn't sufficiently standalone, we also need to see how easily it fits into the rest of Solr (what APIs it adds or modifies, etc).  Actually implementing performance improvements and more distributed search can come later - as long as we've thought about it now so we haven't boxed ourselves in.

It seems like field collapsing should just be additional functionality of the query component rather than a separate component since it changes the results?

The most basic question about the interface would be how to present groups.  Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)?  Or stick that extra info in with some of the documents somehow?  Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents?  Which will be easiest for clients to deal with?  If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like?

>From the wiki:
collapse.maxdocs - what does this actually mean?  I assume it collects arbitrary documents up to the max (normally by index order)?  Does this really make sense?  Does it affect faceting, etc?  If it does make sense, it seems like it would also make sense for normal non-collapsed query results too, in which case it should be implemented at that level.

collapse.info.doc - what does that do?  I understand counts per group, but what's count per doc?

collapse.includeCollapsedDocs.fl - I don't understand this one, and can't find an example on the wiki or blogs.  It says "Parameter indicating to return the collapsed documents in the response"... but I thought documents were included up until collapse.threshold.

collapse.debug - should perhaps just be rolled into debugQuery, or another general debug param (someone recently suggested using a comma separated list... debug=timings,query, etc.

Should I be able to specify a completely different sort *within* a group?  collapse.sort=...  seems nice... what are the implications?  One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group.  It's not necessarily an implementation problem though (sort values for the group should be maintained separately).

Is there a way to specify the number of groups that I want back instead of the number of documents?  Or am I supposed to just over-request (rows=num_groups_I_want*threshold) and ignore if I get too many documents back?

Random thought: We need a test to make sure this works with multi-select faceting (SimpleFacets asks for the docset of be base query...)

Distributed Search: should be able to use the same type of algorithm that faceting does to ensure accurate counts.

Performance: yes, it looks like the current code uses a *lot* of memory.
Here's an algorithm that I thought of on my last plane ride that can do much better (assuming max() is the aggregation function):

{code}
=================== two pass collapsing algorithm for collapse.aggregate=max ====================
First pass: pretend that collapseCount=1
  - Use a TreeSet as  a priority queue since one can remove and insert entries.
  - A HashMap<Key,TreeSetEntry> will be used to map from collapse group to top entry in the TreeSet
  - compare new doc with smallest element in treeset.  If smaller discard and go to the next doc.
  - If new doc is bigger, look up it's group.  Use the Map to find if the group has been added to the TreeSet and add it if not.
  - If the new bigger doc is already in the TreeSet, compare with the document in that group.  If bigger, update the node,
    remove and re-add to the TreeSet to re-sort.

efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance)
We will now have the top 10 documents collapsed by the right field with a collapseCount of 1.  Put another way, we have the top 10 groups.

Second pass (if collapseCount>1):
 - create a priority queue for each group (10) of size collapseCount
 - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1)
 - for each document, find it's appropriate priority queue and insert
 - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched.

So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups.
Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed.
We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing.

Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount.
{code}


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770358#action_12770358 ] 

Lance Norskog commented on SOLR-236:
------------------------------------

This looks like a really nice rework! This JIRA has been a marathon (2.5 years!), but maybe the last miles are here.

Since this JIRA has so many comments, it is hard to navigate. Maybe it is a good time to close it and start a new active JIRA for the field collapsing project. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shekhar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718969#action_12718969 ] 

Shekhar commented on SOLR-236:
------------------------------

Hi,

Has anyone successfully used localsolr and collapse patch together in Solr 1.4-dev. I am getting two result-sets one from localsolr and other from collapse. I need a merged result-set..
I am using localsolr 1.5 and field-collapse-solr-236-2.patch.
Any pointers  ???



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556652#action_12556652 ] 

clh edited comment on SOLR-236 at 1/7/08 1:51 PM:
-----------------------------------------------------------------

bq. UPDATE: Doug Steigerwald's patch (field_collapsing_dsteigerwald.diff) applies cleanly to trunk

I'm having trouble applying field_collapsing_1.3.patch to the head of trunk.

{noformat}
charlie@macbuntu:~/solr/src/java$ patch -p0 < /home/charlie/downloads/field_collapsing_1.3.patch 
patching file org/apache/solr/search/CollapseFilter.java
patching file org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 succeeded at 694 (offset -8 lines).
Hunk #2 succeeded at 1252 (offset -1 lines).
patching file org/apache/solr/common/params/CollapseParams.java
patching file org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 90.
Hunk #3 FAILED at 117.
3 out of 3 hunks FAILED -- saving rejects to file org/apache/solr/handler/StandardRequestHandler.java.rej
patching file org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 40.
Hunk #3 FAILED at 311.
Hunk #4 FAILED at 339.
4 out of 4 hunks FAILED -- saving rejects to file org/apache/solr/handler/DisMaxRequestHandler.java.rej
{noformat}

I'm guessing that maybe the field collapsing patch needs to be updated for the SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a whack at migrating the changes to the SearchHandler.java, and see if I can produce a better patch.

      was (Author: clh):
    I'm having trouble applying field_collapsing_1.3.patch to the head of trunk.

{noformat}
charlie@macbuntu:~/solr/src/java$ patch -p0 < /home/charlie/downloads/field_collapsing_1.3.patch 
patching file org/apache/solr/search/CollapseFilter.java
patching file org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 succeeded at 694 (offset -8 lines).
Hunk #2 succeeded at 1252 (offset -1 lines).
patching file org/apache/solr/common/params/CollapseParams.java
patching file org/apache/solr/handler/StandardRequestHandler.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 90.
Hunk #3 FAILED at 117.
3 out of 3 hunks FAILED -- saving rejects to file org/apache/solr/handler/StandardRequestHandler.java.rej
patching file org/apache/solr/handler/DisMaxRequestHandler.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 40.
Hunk #3 FAILED at 311.
Hunk #4 FAILED at 339.
4 out of 4 hunks FAILED -- saving rejects to file org/apache/solr/handler/DisMaxRequestHandler.java.rej
{noformat}

I'm guessing that maybe the field collapsing patch needs to be updated for the SearchHandler refactoring that was does as part of SOLR-281? If so, I'll take a whack at migrating the changes to the SearchHandler.java, and see if I can produce a better patch.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679662#action_12679662 ] 

Mark Miller commented on SOLR-236:
----------------------------------

bq. Thanks. In the wiki next to each one of these parameters it explicitly says that reducing this parameter will decrease memory usage, this is why we reduced these parameters (it did not mention the filterCache at all).

They will save RAM to a certain extent for certain situations. But not very helpful at the sizes you are working with (and not settings I would use to save RAM anyway, unless the amount I need to save was pretty small). Also, the savings are largely index side - not likely a huge part of your RAM concerns, which are search side.

bq. My filterCache stats are great- you know it's set to 64K but right now, with almost all the RAM used up (we're at 71.9% now), but it's only using 36290 entries at the moment and it's holding pretty steady there(even as RAM usage increased by 10%). None of the other caches have gone up much either. We have no cache evictions, at all, but a 99% hit ratio.

The sizes may be higher than you need then. They should be adjusted to the best settings based on the wiki info. I was originally suggesting you might sacrifice speed with the caches for RAM - but, its always best to use the best settings and have the necessary RAM.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Billy Morgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858333#action_12858333 ] 

Billy Morgan commented on SOLR-236:
-----------------------------------

@Claus

I am having the same issue

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Dave Redford (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694851#action_12694851 ] 

Dave Redford commented on SOLR-236:
-----------------------------------

There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong order (note: Id is our unique Id)

but 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq make it work eg:

fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771155#action_12771155 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

It certainly has be going on for a long time :-)
Talking about the last miles there are a few things in my mind about field collapsing:
* Change the response format. Currently if I look at the response even I get confused sometimes about the information returned. The response should more structured. Something like this:
{code:xml}
<lst name="collapse_counts">
    <str name="field">venue</str>
    <lst name="results">
        <lst name="233238"> <!-- id of most relevant document of the group -->
            <str name="fieldValue">melkweg</str>
            <int name="collapseCount">2</int>
            <!-- and other CollapseCollector specific collapse information -->
        </lst>
        ...
    </lst>
</lst>
{code}
Currently when doing adjacent field collapsing the _collapse_counts_ gives results that are unusable to use. The _collapse_counts_ use the field value as key which is not unique for adjacent collapsing as shown in the example: 
{code:xml}
<lst name="collapse_counts">
 <int name="hard">1</int>
 <int name="hard">1</int>
 <int name="electronics">1</int>
 <int name="memory">2</int>
 <int name="monitor">1</int>
</lst>
{code}
* Add the notion of a CollapseMatcher, that decides whether document field values are equal or not and thus whether they are allowed to be collapsed. This opens the road for more exotic features like fuzzy field collapsing and collapsing on more than one field. Also this allows users of the patch to easily implement their own matching rules.
* Distributed field collapsing. Although I have some ideas on how to get started, from my perspective it not going to be performed. Because somehow the field collapse state has to be shared between shards in order to do proper field collapsing. This state can potentially be a lot of data depending on the specific search and corpus.
* And maybe add a collapse collector that collects statistics about most common field value per collapsed group. 

I think that this is somewhat the roadmap from my side for field collapsing at moment, but feel free to elaborate on this.
Btw I have recently written a [blog|http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/] about field collapsing in general, that might be handy for someone who is implementing field collapsing. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Kevin Cunningham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831617#action_12831617 ] 

Kevin Cunningham commented on SOLR-236:
---------------------------------------

No, just field collapsing.  We went back to the field-collapse-5.patch for the time being.  So far its been good and we updated just to get closer to the latest not because we were seeing issues.  Thanks.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655750#action_12655750 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

I'm using Ivan's patch and running into some trouble with faceting...

Basically, I can tell that faceting is happening after the collapse - because the facet counts are definitely lower than they would be otherwise.  For example, with one search, I'd have 196 results with no collapsing, I get 120 results with collapsing - but the facet count is 119???  In other searches the difference is more drastic - In another search, I get 61 results without collapsing, 61 with collapsing, but the facet count is 39.

Looking at it for a while now, I think I can guess what the problem might be...

The incorrect counts seem to only happen when the term in question does not occur evenly across all duplicates of a document.  That is, multiple document records may exist for the same image (it's an image search engine), but each document will have different terms in different fields depending on the audience it's targeting.  So, when you collapse, the counts are lower than they should be because when you actually execute a search with that facet's term included in the query, *all* the documents after collapsing will be ones that have that term.

Here's an illustration:

Collapse field is "link_id", facet field is "keyword":


Doc 1:
id: 123456,
link_id: 2,
keyword: Black, Printed, Dress

Doc 2:
id: 123457,
link_id: 2,
keyword: Black, Shoes, Patent

Doc 3:
id: 123458,
link_id: 2,
keyword: Red, Hat, Felt

Doc 4:
id: 123459,
link_id:1,
keyword: Felt, Hat, Black

So, when you collapse, only two of these documents are in the result set (123456, 123459), and only the keywords Black, Printed, Dress, Felt, and Hat are counted.  The facet count for Black is 2, the facet count for Felt is 1.  If you choose Black and add it to your query, you get 2 results (great).  However, if you add *Felt* to your query, you get 2 results (because a different document for link_id 2 is chosen in that query than is in the more general query from which the facets are produced).

I think what needs to happen here is that all the terms for all the documents that are collapsed together need to be included (just once) with the document that gets counted for faceting.  In this example, when the document for link_id 2 is counted, it would need to appear to the facet counter to have keywords Black, Printed, Dress, Shoes, Patent, Red, Hat, and Felt, as opposed to just Black, Printed, and Dress.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Dave Redford (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694851#action_12694851 ] 

Dave Redford edited comment on SOLR-236 at 4/1/09 5:56 PM:
-----------------------------------------------------------

There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong ordering (note: Id is our unique Id)

but adding a another field - even a bogus one - works.
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq makes it work 
eg:
fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...


      was (Author: dredford):
    There is an issue with collapsed result ordering when querying with only the unique Id and score fields in the request.

eg: 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

gives wrong order (note: Id is our unique Id)

but 
q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score,bogus&collapse.field=PrimaryId&collapse.max=1

Also using an fq make it work eg:

fq=Type:articles&q=ford&version=2.2&start=0&rows=10&indent=on&fl=Id,score&collapse.field=PrimaryId&collapse.max=1

I'm using the latest Dmitry patch (25/mar/09) against 1.3.0.

Apart from that great so far...

  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564204#action_12564204 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

That works, thanks :-)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503162 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

Do we have to make a choice ? Both behaviors are interesting. 
What about a new parameter like collapse.facet=[pre|post] ?



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563779#action_12563779 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Hello, I am new to Solr, so forgive me if what I say doesn't make sense... None of the patches for 1.3 work any more, since the file org.apache.solr.handler.SearchHandler has been removed from the nightly builds. Will someone write a new patch that works with teh current nightly builds? If not, could we get a copy of an old nightly build somewhere? Thanks a lot.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538327 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

Hi Tracy-

There has not been much movement on this while we get SOLR-281 sorted (I hope this happens soon) -- once that is in, there will hopefully be an updated patch on the 1.3 branch that will be posted here.

"1.3" is not a branch yet -- it is the trunk revision that most patches work with.  Only when it becomes an official release, will it actually get called 1.3 in the repository.

If you need to show field collapsing soon, I think your best bet (i have not tried it) is to apply the ' field_collapsing_1.1.0.patch' to the 1.1.0 branch ( http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.1.0/ )  But if you can wait a few weeks, it will *hopefully* be available in trunk (or easily patchable from trunk)

ryan



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "dieter grad (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dieter grad updated SOLR-236:
-----------------------------

    Attachment: collapsing-patch-to-1.3.0-dieter.patch


I had to make a patch to fix two issues that we needed for our system. I am not used to this code, so maybe someone can pick this patch and make it something useful for everybody.

The fixes are:

1) When collapsing.facet=before, only the collapsed documents are returned (and not the whole collection).

2) When collapsing is normal, the selected sort order is preserved by returning the first document of the collapsed group.

For example, if the values of the collapsing field are:

1) Y
2) X  
3) X
4) Y
5)X
6)Z

the documents returned are 1, 2 and 6, in that order.

So, for example, if you sort by price ascending, you will get the result sorted by price, where each item is the cheapest item of its collapsed group.




> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674803#action_12674803 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Are the any concrete plans on where this feature is going? Is it ever going to get support for distributed search?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495356 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

My turn to miss something ;)
You are right, we have to use params.required().get("collapse.field"). 

About collapse info:
<int name="has_more_results">3</int> means that the third doc of the result has been collapsed and that some consecutive results having same field has been removed.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793476#action_12793476 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

bq. You think that collapse.collectDiscardedDocuments.fl is better?

Is this something that's really needed?  If so, some other name ideas could be
collapse.discarded.fl  
collapse.discarded.limit  (doesn't seem to be a good idea to have an unbounded number).

bq. Just one thought I had about the algorithm you propose. If you only create collapse groups for the top ten documents then what about the total count of the search? Unique documents outside the top ten documents are not being grouped (if I understand you correctly) and that would impact the total count with how it currency works.

Right - one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing.  So only collapse.facet=before would be supported.  I do think that just like faceting, there will be multiple ways of doing collapsing.

Anyway, this is a great example of trying to make sure the interface doesn't preclude optimizations.  Perhaps the total count of the search (numFound) should be pre-collapsing if collapse.facet=before, or perhaps it should always be pre-collapsing, and we should have another optional count for post-collapsing?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

Hi Paul, thanks for pointing this out. I also tried to hammer my Solr instance and I got the same exceptions, which is not good. I have attached a patch that fixes these exceptions. The problem was indeed centred around the collapseRequest field and I have fixed this by using a ThreadLocal that holds the CollapseRequest instance. Because of this the reference to the CollapseRequest is not shared across the search requests and thus a new thread cannot interfere with a collapse request that is still being used by another thread.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shekhar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720908#action_12720908 ] 

Shekhar commented on SOLR-236:
------------------------------

Thanks a lot Martijn for you help..
Could you please point me to the example you are referring to. I could not find any example which is using DistanceCalculatingComponent. 


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Muddassir hasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618065#action_12618065 ] 

Muddassir hasan commented on SOLR-236:
--------------------------------------

I tried to use this patch but i could make it work for me. I compiled solr with patch applied and added following to my solrconf : <searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" /> and 
<requestHandler name="/search" class="org.apache.solr.handler.component.SearchHandler">
    <arr name="components">
      <str>collapse</str>
    </arr>
</requestHandler>

It started perfectly but i could not find any collapse on using
collapse.field=key_string11&collapse.type=normal&collapse.max=1&collapse=true

If i m missing out something. plz let me know my mistake.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Iván de Prado (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655269#action_12655269 ] 

ivan.prado edited comment on SOLR-236 at 12/10/08 8:34 AM:
--------------------------------------------------------------

I have attached new patch with the problems solved in my first submitted patch. Doug Steigerwald, could you check if this patch works with for you? Thanks. 

      was (Author: ivan.prado):
    A new patch with problems solved in my first submitted patch. 
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Patrick Eger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792587#action_12792587 ] 

Patrick Eger commented on SOLR-236:
-----------------------------------

Hi, possibly not important but would like to give my perspective as a user. Specifically, the code is very much production ready in our opinion, albeit under a limited set of circumstances that we are comfortable with (< 5 million docs, no distributed search). Within those confines it works great and satisfies our needs, and we are more than willing to pay the performance hit since it's absolutely essential to the correct functionality. I suppose i'd disagree with the assertion that the performance is "unacceptable", as i think that is a value judgement each user will have to make.

Modulo the discussion about the request format, output format and config (stuff that is hard to change later). I would much rather have the code be in and documented with those caveats clearly spelled out and probably tracked in separate JIRA issues. IE DO NOT USE IF SHARDING, >5 million docs, etc, etc. Again, just my 2c as a satisfied user.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Ron Veenstra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716128#action_12716128 ] 

Ron Veenstra edited comment on SOLR-236 at 6/3/09 7:22 PM:
-----------------------------------------------------------

I require assistance.  I've installed a fresh Solr (1.3.0), and all appears/operates well.  I then patch using SOLR-236_collapsing.patch [by  	Thomas Traeger]  (the last patch i saw claimed to work with 1.3.0), without error.  I then add to solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) :

  <searchComponent name="collapse"     class="org.apache.solr.handler.component.CollapseComponent" />

Upon restart, I get a long configuration error, which seems to hinge on:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: <abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml ------------------------------------------------------------- org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

[the full error can be included if desired.]

I've verified that the CollapseComponent file exists in the proper place.
I've moved CollapseParams as required, (move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ )
I've tried multiple iterations of the patch (on fresh installs), all with the same issue.

Are there additional steps, patches, or configurations that are required?
Is this a known issue?
Any help is very much appreciated.

      was (Author: ronunism):
    I require assistance.  I've installed a fresh Solr (1.3.0), and all appears/operates well.  I then patch using SOLR-236_collapsing.patch (the last patch i saw claimed to work with 1.3.0), without error.  I then add to solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) :

  <searchComponent name="collapse"     class="org.apache.solr.handler.component.CollapseComponent" />

Upon restart, I get a long configuration error, which seems to hinge on:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: <abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml ------------------------------------------------------------- org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

[the full error can be included if desired.]

I've verified that the CollapseComponent file exists in the proper place.
I've moved CollapseParams as required, (move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ )
I've tried multiple iterations of the patch (on fresh installs), all with the same issue.

Are there additional steps, patches, or configurations that are required?
Is this a known issue?
Any help is very much appreciated.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749314#action_12749314 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

Hi Martin, I tested your latest patch, found no problem so far. The code is indeed better to understand now, good work.

For my current project I need to know which documents have been removed during collapsing. The current idea is to change the collapsing info and add an array with all document IDs that are removed from the result. Any suggestion on how/where to implement this?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638355#action_12638355 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

What's a hard drive sort?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "JList (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606953#action_12606953 ] 

JList commented on SOLR-236:
----------------------------

Although field collpasing worked fine in my brief testing, when I put it to work with more documents, I got exceptions. It seems to have something to do with the queries (or documents, since different queries return different documents). With some queries, this exception does not happen.

If I remove the collapse.* parameters, the error does not happen. Any idea why this is happening? Thanks.


HTTP ERROR: 500
Unsupported Operation

org.apache.solr.common.SolrException: Unsupported Operation
        at org.apache.solr.search.NegatedDocSet.iterator(NegatedDocSet.java:77)
        at org.apache.solr.search.DocSetBase.getBits(DocSet.java:183)
        at org.apache.solr.search.NegatedDocSet.getBits(NegatedDocSet.java:27)
        at org.apache.solr.search.DocSetBase.intersection(DocSet.java:199)
        at org.apache.solr.search.BitDocSet.intersection(BitDocSet.java:30)
        at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1109)
        at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:811)
        at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1282)
        at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:57)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: SOLR-236.patch

Updated the patch, so it patch without conflicts with the current trunk. Also included a bugfix regarding to field collapsing and filter cache that was noticed by Varun Gupta on the user mailing list.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754511#action_12754511 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Oleg, no I have not made any progress. I'm still not clear how to solve it in an efficient manner as I have written in my previous comment:

{quote}
I was trying to come up with a solution to implement distributed field collapsing, but I ran into a problem that I could not solve in an efficient manner.

Field collapsing keeps track of the number of document collapsed per unique field value and the total count documents encountered per unique field. If the total count is greater than the specified collapse
threshold then the number of documents collapsed is the difference between the total count and threshold. Lets say we have two shards each shard has one document with the same field value. The collapse threshold is one, meaning that if we run the collapsing algorithm on the shard individually both documents will never be collapsed. But when the algorithm applies to both shards, one of the documents must be collapsed however neither shared knows that its document is the one to collapse.

There are more situations described as above, but it all boils down to the fact that each shard does not have meta information about the other shards in the cluster. Sharing the intermediate collapse results between the shards is in my opinion not an option. This is because if you do that then you also need to share information about documents / fields that have a collapse count of zero. This is totally impractical for large indexes.

Besides that there is also another problem with distributed field collapsing. Field collapsing only keeps the most relevant document in the result set and collapses the less relevant ones. If scoring is used to sort then field collapsing will fail to do this properly, because of the fact there is no global scoring (idf).

Does anyone have an idea on how to solve this? The first problem seems related to same kind of problem implementing global score has.
{quote}

I recently read something about Katta and . Katta facilitates distributed search and has for support global scoring. I'm not completely sure how it is implemented in Katta, but maybe with Katta it is relative efficient to share the intermediate collapse results between shards.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728044#action_12728044 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Oleg, I have checked your latest patch, but I could not find the code that deals with the distributed search. How did you make collapsing work for distributed search? Which parameters did you use while doing a search? What I can tell is that the latest patches do not support field collapsing for distributed search.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779418#action_12779418 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

I can confirm this bug. I will attach a new patch that fixes this issue shortly. Thanks for noticing. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: collapse_field.patch

Remplacing HashDocSet by BitDocSet for hasMoreResult for better performances

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Aytek Ekici (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766018#action_12766018 ] 

Aytek Ekici commented on SOLR-236:
----------------------------------

Hi Martijn,
Thanks a lot it works.

Aytek

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Kevin Cunningham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799409#action_12799409 ] 

Kevin Cunningham commented on SOLR-236:
---------------------------------------

Which patch is recommended for those running a stock 1.4 release?  

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835258#action_12835258 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:06 PM:
------------------------------------------------------------

Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query 

http://server/cs-bidcs/select?q=*:*&collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

      was (Author: peathal):
    Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query 

http://searchdev05:15100/cs-bidcs/select?q=*:*&collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659300#action_12659300 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

I tested 1.3 and ivans latest patch.

When I add a Filter Query (fq param) to my query I get an exception "Either filter or filterList may be set in the QueryCommand, but not both.". I'm not that familiar with java but at least disabled the exception in SolrIndexSearch.java. I can use Filter Queries now and no problems occured so far. But surely this has to be handled in another way.

Btw, I think this had already been fixed by Karsten back in 2007 in some way (patch field-collapsing-extended-592129.patch). He commented it with:

"Made a minimal change to SolrIndexSearcher.getDocListC() to support passing both the filter and filterList parameters. In most cases this was already handled anyway."



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835230#action_12835230 ] 

Peter Karich commented on SOLR-236:
-----------------------------------

We are facing OutOfMemory problems too. We are using https://issues.apache.org/jira/secure/attachment/12425775/field-collapse-5.patch

> Are you using any other features besides plain collapsing? The field collapse cache gets large very quickly,
> I suggest you turn it off (if you are using it). Also you can try to make your filterCache smaller.

How can I turn off the collapse cache or make the filterCache smaller?
Are there other workarounds? E.g. via using a special version of the patch ?

I read that it could help to specify collapse.maxdocs but this didn't help in our case ... could collapse.type=adjacent help here?  (https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12495376&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12495376)

What do you think?

BTW: We really like this patch and would like to use it !! :-)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793607#action_12793607 ] 

Shalin Shekhar Mangar commented on SOLR-236:
--------------------------------------------

{quote}
This is exactly the point, it's not really meta-data over the document, but on the group the document belongs to. And you also need a more obvious way to mark this document as a group representation (to distinguish it from other normal documents).
{quote}

We show the highest scoring document of a group, so does the fact that the metadata belongs to the group and not the document matter at all?

{quote}
But extending the current <doc> element, doesn't mean we break BWC. Adding a <collapse-info> (or <collapse-meta-data>) sub element to it, will certainly not break anything, specially when we still don't have a formal xsd for the responses (I know we're working on it, but it's still not out there so it's safe).
{quote}

We are not extending anything. We're just adding a couple of fields which may not exist in the index and this is a capability we plan to introduce anyway (however this issue does not need to depend on SOLR-1566). The response format remains exactly the same. There is no break in compatibility.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495334 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

This looks good.  Someone with better lucene chops should look at the IndexSearcher getDocListAndSet part...

A few comments/questions about the interface:

If you apply all the example docs and hit:
http://localhost:8983/solr/select/?q=*:*&collapse=true

you get 500.  We should use:  params.required().get( "collapse.field" ) to have a nicer error:

With:
http://localhost:8983/solr/select/?q=*:*&collapse=true&collapse.field=manu&collapse.max=1

the collapse info at the bottom says:

<lst name="collapse_counts">
 <int name="has_more_results">3</int>
 <int name="has_more_results">5</int>
 <int name="has_more_results">9</int>
</lst>

what does that mean?  How would you use it? How does it relate to the <result docs?









> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing.patch

This release is more conform with the semantics of "field collapsing".

Parameters are:

collapse=true                   // enable collapsing
collapse.field=[field]       // indexed field used for collapsing
collapse.max=[integer]  // Start collapsing after n document
collapse.type=[normal|adjacent] // Default value is "normal"

- "adjacent" collapse only consecutive documents.
- "normal" collapse all documents having equal collapsing field.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792510#action_12792510 ] 

Mark Miller edited comment on SOLR-236 at 12/18/09 3:41 PM:
------------------------------------------------------------

bq. (Faceting fot a 50 times perf boost in 1.4)

No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-valued field faceting cases?). And general faceting was always relatively fast and scalable.

I'm against committing features to trunk with a warning that the feature is not ready for trunk.

      was (Author: markrmiller@gmail.com):
    bq. (Faceting fot a 50 times perf boost in 1.4)

No it didn't. Certain cases have gotten a boost (I think you might be referring to multi-field faceting cases?). And general faceting was always relatively fast and scalable.

I'm against committing features to trunk with a warning that the feature is not ready for trunk.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841147#action_12841147 ] 

Peter Karich commented on SOLR-236:
-----------------------------------

regarding the OutOfMemory problem: we are now testing the suggested change in production.

I replaced the float array with a TreeMap<Integer, Float>. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method:

{noformat} 
public int advance(int target) throws IOException {
            // now we need a treemap method:
            iter = scores.tailMap(target).entrySet().iterator();
            if (iter.hasNext())
                return target;
            else
                return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via 

{noformat} 

if (value1 == null)
                value1 = 0f;
if (value2 == null)
                value2 = 0f;

{noformat} 

although the compare method should be called if no docs are in the scores array ... ?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565194#action_12565194 ] 

Charles Hornberger commented on SOLR-236:
-----------------------------------------

bq. As Yonik has pointed out, operations on a NegatedDocSet can be rewritten as (different) operations on the set being negated. The operation methods inside NegatedDocSet do this.

Right. I realized, sheepishly, after I posted the first suggested patch that it'd be much simpler to just mimic the first if-clause in DocSet.intersection():

{code}
  if (other instanceof NegatedDocSet) {
    other.intersection(this);
  }
{code}

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Karsten Sperling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565158#action_12565158 ] 

Karsten Sperling commented on SOLR-236:
---------------------------------------

NegatedDocSet got introduced because the filter logic expects to use the intersection operation to apply a number of filters to a result. Introducing a negated docset was much easier than supporting both intersection as well as and-not type filters.

NegatedDocSet does not support iteration because the negation of a finite set is (at least theoretically) infinite. Even though it would in practice be possible to limit the negated set via the known maximum document id, this would probably not be very efficient. However, it is simply not necessary to ever iterate over the elements of a NegatedDocSet, because we know that the end-result of all DocSet operations is going to be a finite set of results, not an infinite one. A NegatedDocSet will only ever be used to "subtract" from a finite DocSet. As Yonik has pointed out, operations on a NegatedDocSet can be rewritten as (different) operations on the set being negated. The operation methods inside NegatedDocSet do this.

The reason the bug occurs is because of the naive way the binary set operation calls are dispatched: DocSet clients simply call e.g. set1.intersection(set2), arbitrarily leaving the choice of implementation to the logic defined by the class of set1. Currently, BitDocSet does not know about NegatedDocSet, and hence doesn't perform the necessary rewriting or delegation to NegatedDocSet.

However, instead of requiring each and every DocSet subclass to know about all other ones (and in the absence of language support for multiple dispatch), I think it would be better to centralize this knowledge in a single class DocSetOp with static methods that selects the appropriate implementation for an operation based on the type of _both_ parameters. Either the client code could be changed to call DocSetOp.intersection(a, b) instead of a.intersection(b), but this would involve changing the DocSet interface. A backwards compatible solution would be to simply have final DocSetBase.intersection() delegating to DocSetOp.intersection.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751181#action_12751181 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Thomas,

Comparing my format proposal with yours, the difference is how I output the collapsed documents. I chose to add all collapsed values in an element per field, because that would make it more compact and thus easier to transmit on the wire (certainly if the number of collapsed documents to return is large). This approach is not standard in Solr and your result structure is more common. I think that most of time is properly spent at reading the collapsed field values from the index anyway (i/o), therefore I think that your result structure is right now properly the best way to go.

I think that supporting the 'old' format is not that good of an idea, because this only increases complexity in the code. Also field collapsing is just a patch (although it is around for while) and is not a core Solr feature. I think people using this patch (and a patch in general) should always be aware that everything in a patch is subject to change. I think that _collapse.response_ should be named something like _collapse.includeCollapsedDocs_ when this is specified it includes the collapsed documents. The _collapse.includeCollapsedDocs.fl_ would then only include the specified fields in the collapsed documents. So specifying _collapse.includeCollapsedDocs=true would result into the following result:
{code:xml}
<lst name="collapse_counts">
    <str name="field">venue</str>
    <lst name="results">
        <lst name="233238">
            <str name="fieldValue">melkweg</str>
            <int name="collapseCount">2</int>
             <lst name="collapsedDocs">
                <doc>
                    <str name="id">233239</str>
                    <str name="name">Foo Bar</str>
                    ...
                </doc>
                <doc>
                    <str name="id">233240</str>
                    <str name="name">Foo Bar 2</str>
                    ...
                </doc>
            </lst>
        </lst>
    </lst>
</lst>
{code}
Not specifying the _collapse.includeCollaspedDocs_ would result into the following response output:
{code:xml}
<lst name="collapse_counts">
    <str name="field">venue</str>
    <lst name="results">
        <lst name="233238">
            <str name="fieldValue">melkweg</str>
            <int name="collapseCount">2</int>
        </lst>
    </lst>
</lst>
{code}
This will be the default and only response format.
And when for example _collapse.info.doc=false_ is specified then the following result will be returned:
{code:xml}
<lst name="collapse_counts">
    <str name="field">venue</str>
    <lst name="results"> 
        <lst name="melkweg"> <!-- we can not use the head document id any more, so we use the field value --> 
            <int name="collapseCount">2</int>
        </lst>
    </lst>
</lst>
{code}
When _collapse.info.count=false_ is specified this would just remove the _fieldValue_ from the response. I do not know if these parameters are actually set to false by many people, but it is something to keep in mind. I also recently added support for field collapsing to solrj in the patch, obviously this has to be updated to the latest response format.

In general it must be made clear to the Solr user that this feature is handy, but it can dramatically influence the performance in a negative way. This is because the response can contain a lot of documents and each field value has to be read from the index, which results in a lot of i/o activity on the Solr side. Just because of the fact that a lot of data is returned in the response; simply viewing the response in the browser can become quite a challenge.

But more important do you think that these changes are acceptable (response format / request parameters)?


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Doug Steigerwald (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655285#action_12655285 ] 

Doug Steigerwald commented on SOLR-236:
---------------------------------------

Looks fine from my little bit of testing.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ron Veenstra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716459#action_12716459 ] 

Ron Veenstra commented on SOLR-236:
-----------------------------------

Thomas,

Again thanks.  I've verified that the CollapseComponent is indeed NOT present in the war.  That'd suggest something going amiss during the patching process, correct?  And as it appears to be happening each time, either there's an issue with the patch (which others have verified as working) or something conflicts with my current setup (solr / tomcat / CentOS).  Can I manually create apache-solr-core and force the file in?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Heigl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850921#action_12850921 ] 

Thomas Heigl commented on SOLR-236:
-----------------------------------

@Martijn:

There is a small problem with the latest patch file. Both TortoiseSVN and patch complain that the file is malformed because there is an "empty" patch for FieldCollapseResponse.java around line 2199. Simply removing lines 2195-2199 does the trick.

Apart from that, the patch works perfectly for me.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566864#action_12566864 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed?

As a result I see:
<pre>
<lst name="collapse_counts">
    <int name="Restaurant">2414</int>
    <int name="Bar/Club">9</int>
    <int name="Directory & Services">37</int>
</lst>
</pre>

Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory & Services? If so, then that's great.

However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields?

Thanks in advance for any help you can provide!

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501405 ] 

Ryan McKinley commented on SOLR-236:
------------------------------------

I just took a look at this using the example data:
http://localhost:8983/solr/select/?q=*:*&collapse.field=cat&collapse.max=1&collapse.type=normal&rows=10

<lst name="collapse_counts">
 <str name="field">cat</str>
 <lst name="doc">
  <int>1</int>
  <int name="1">2</int>
  <int name="2">2</int>
  <int name="4">1</int>
  <int name="7">1</int>
 </lst>
 <lst name="count">
  <int>1</int>
  <int name="card">2</int>
  <int name="drive">2</int>
  <int name="hard">1</int>
  <int name="music">1</int>
 </lst>
</lst>

- - -

what is the "<int>1</int>" at the front of each response?

Perhaps the 'doc' results should be renamed 'offset' or 'index', and then have another one named 'doc' that uses the uniqueKey as the index...  this would be useful to build a Map.

- - -

Also, check:
http://localhost:8983/solr/select/?q=*:*&collapse.field=cat&collapse.max=1&collapse.type=adjacent&rows=50

 ArrayIndexOutOfBoundsException:

- - -

> You should add the following constraint on the wiki: The collapsing field must be un-tokenized.

Anyone can edit the wiki (you just have to make an account) -- it would be great if you could help keep the page accurate / useful.  JIRA discussion comment trails don't work so well at that...

Re: tokenized...  what about it does not work?  Are the limitations an different if it is mult-valued?  Is it just that if any token matches within the field it will collapse and that may or may not be what you expect?

- - -

Did you get a chance to look at the questions from the previous discussion?  I just noticed Yonik posted something new there:
http://www.nabble.com/result-grouping--tf2910425.html#a10959848


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792916#action_12792916 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

First, thanks to everyone who has spent so much time working on this - lack of committer attention doesn't equate to lack of interest... this is a very much needed feature!

I'd agree with Erik that the most important thing is the interface to the client, and making it well thought out and semantically "tight". Martijn's recent improvements to the response structure is an example of improvements in this area. It's also important to think about the interface in terms of how easy it will be to add further features, optimizations, and support distributed search. If the code isn't sufficiently standalone, we also need to see how easily it fits into the rest of Solr (what APIs it adds or modifies, etc). Actually implementing performance improvements and more distributed search can come later - as long as we've thought about it now so we haven't boxed ourselves in.

It seems like field collapsing should just be additional functionality of the query component rather than a separate component since it changes the results?

The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like?

>From the wiki:
collapse.maxdocs - what does this actually mean? I assume it collects arbitrary documents up to the max (normally by index order)? Does this really make sense? Does it affect faceting, etc? If it does make sense, it seems like it would also make sense for normal non-collapsed query results too, in which case it should be implemented at that level.

collapse.info.doc - what does that do? I understand counts per group, but what's count per doc?

collapse.includeCollapsedDocs.fl - I don't understand this one, and can't find an example on the wiki or blogs. It says "Parameter indicating to return the collapsed documents in the response"... but I thought documents were included up until collapse.threshold.

collapse.debug - should perhaps just be rolled into debugQuery, or another general debug param (someone recently suggested using a comma separated list... debug=timings,query, etc.

Should I be able to specify a completely different sort *within* a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately).

Is there a way to specify the number of groups that I want back instead of the number of documents? Or am I supposed to just over-request (rows=num_groups_I_want*threshold) and ignore if I get too many documents back?

Random thought: We need a test to make sure this works with multi-select faceting (SimpleFacets asks for the docset of be base query...)

Distributed Search: should be able to use the same type of algorithm that faceting does to ensure accurate counts.

Performance: yes, it looks like the current code uses a *lot* of memory.
Here's an algorithm that I thought of on my last plane ride that can do much better (assuming max() is the aggregation function):

{code}
=================== two pass collapsing algorithm for collapse.aggregate=max ====================
First pass: pretend that collapseCount=1
  - Use a TreeSet as a priority queue since one can remove and insert entries.
  - A HashMap<Key,TreeSetEntry> will be used to map from collapse group to top entry in the TreeSet
  - compare new doc with smallest element in treeset. If smaller discard and go to the next doc.
  - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not.
  - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node,
    remove and re-add to the TreeSet to re-sort.

efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance)
We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups.

Second pass (if collapseCount>1):
 - create a priority queue for each group (10) of size collapseCount
 - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1)
 - for each document, find it's appropriate priority queue and insert
 - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched.

So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups.
Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed.
We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing.

Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount.
{code} 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Vaijanath N. Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644200#action_12644200 ] 

Vaijanath N. Rao commented on SOLR-236:
---------------------------------------

Hi All,

I am trying to apply this patch to solr-1.4 code and getting following errors.
At line number 58 of the CollapseComponent.java and the error is:
The method getDocListAndSet (Query, List<Query>, Sort, int , int , int) in the type SolrIndexSearcher is not applicable for the arguments  (Query, List<Query>, DocSet, Sort, int , int , int)

Can anyone tell me the correction I need to do to get this code working.

--Thanks and Regards
Vaijanath

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496805 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

You're right. As collapse.field is a required field, we don't need more information.  My first idea was to copy the behavior of facet.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 4 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "emmanuel vecchia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718116#action_12718116 ] 

emmanuel vecchia commented on SOLR-236:
---------------------------------------

I applied the latest patch field-collapse-solr-236-2.patch to http://www.apache.org/dist/lucene/solr/1.3.0/apache-solr-1.3.0.tgz and tried to compile it seems to require org.apache.lucene.search.FieldComparator and org.apache.lucene.search.Collector and maybe other classes from lucene. I checked out a few version of lucene but looking at LUCENE-1483 it seems that only the current trunk have the classes needed. So it doesn't seem to be possible to use the patch with 1.3 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Kevin Cunningham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830305#action_12830305 ] 

Kevin Cunningham edited comment on SOLR-236 at 2/5/10 11:06 PM:
----------------------------------------------------------------

Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory.  Were there any confirmed issues that may have been addressed with the later patches?  We're using the 12-24 patch.  Any toggles we can switch to still get the feature, yet minimize the memory footprint?

We had been running the 11-29 field-collapse-5.patch patch and saw nothing near this amount of memory consumption.

      was (Author: kunningham):
    Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory.  Were there any confirmed issues that may have been addressed with the later patches?  We're using the 12-24 patch.  Any toggles we can switch to still get the feature, yet minimize the memory footprint?
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Karsten Sperling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657167#action_12657167 ] 

Karsten Sperling commented on SOLR-236:
---------------------------------------

I'm pretty sure the problem Stephen ran into is an off-by-one error in the bitset allocation inside the collapsing code; I ran into the same problem when I customized it for internal use about half a year ago -- and unfortunately forgot all about the problem until reading Stephen's comment just now. Basically the bitset gets allocated 1 bit too small, so there's about a 1/32 chance that if the bit for the document with the highest ID gets set it will cause the AIOOB exception.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Robert Zotter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850930#action_12850930 ] 

Robert Zotter commented on SOLR-236:
------------------------------------

@Thomas. Thanks for the input. Do you think its best to go with a clean version of 1.4 or the latest from trunk? Basically I'm asking if you think trunk is semi-stable enough for a production environment. Thanks

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Iván de Prado (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679615#action_12679615 ] 

Iván de Prado commented on SOLR-236:
------------------------------------

Is not random. I don't remember pretty well, but I think that documents are sorted by the collapsing field. After that, they are being grouped sequentially until reaching maxdocs. The groups that results from there are the documents that are presented. So the number of groups resulted are always smaller than the number of maxdocs. 

Summary: only maxdocs are scanned to generate the resulting groups.



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "German Attanasio Ruiz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779061#action_12779061 ] 

German Attanasio Ruiz commented on SOLR-236:
--------------------------------------------

Sorting of results doesn't work properly. Next, I detail the steps I followed and the problem I faced

I am using solr as a search engine for web pages, from which I use  a field named "site" for collapsing and sort over scord

Steps
After downloading the last version of solr "solr-2009-11-15" and applying the patch "field-collapse-5.patch 2009-11-15 08:55 PM Martijn van Groningen 239 kB"

STEP 1 - I make a search using fieldcollapsing and the result is correct, the number with greatest scord is 0.477
STEP 2 - I make the same search and the fieldcollapsing throws other result with scord 0.17, the (correct) result of step 1 does not appear again

Possible problem
Step 1 stores the document in the cache for future searches
at Step 2 the search is don over the cache and does not find the previously stored document

Possible solution
I believe that the problem is in the storing of the document in the cache since if we make step 2 again we have the same result and the document with scord of 0.17 is not removed from the results, the only result removed is the document with scord 0.477

Conclusion
Documents are not sorted properly when using "fieldcollapsing + solrcache", that is when documents stored in solr cache are required

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Domingo Gómez García (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Domingo Gómez García updated SOLR-236:
--------------------------------------

    Comment: was deleted

(was: The results of collapse_counts are not what i have expected. It losses many categories, only showing a few . I tried incrementing the collapse.max parameter:

max=1 results 

<lst name="doc">
<int name="2008/LICOBLE-00023">109</int>
<int name="2008/LICOBLE-3">5</int>
<int name="2009/LICOBLE-00036">4</int>
<int name="2009/LICOBLE-00095">1</int>
</lst>
−
<lst name="count">
<int name="12740">109</int>
<int name="12741">5</int>
<int name="13282">4</int>
<int>1</int>
</lst>


max=2 results

<lst name="doc">
<int name="2009/LICOBLE-00008">108</int>
<int name="2007/LICOBLE-1">4</int>
</lst>
−
<lst name="count">
<int name="12740">108</int>
<int name="12741">4</int>
</lst>


max=3 results

<lst name="doc">
<int name="2008/LICOBLE-00020">107</int>
<int name="2008/LICOBLE-00021">3</int>
</lst>
−
<lst name="count">
<int name="12740">107</int>
<int name="12741">3</int>
</lst>


max=4

<lst name="doc">
<int name="2009/LICOBLE-00060">106</int>
</lst>
−
<lst name="count">
<int name="12740">106</int>
</lst>

How is possible to get less results each time? There are like 70 categories, do I have any way to obtain all those counts? Am I mising any collapsing concept?
Thanks.)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Karsten Sperling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karsten Sperling updated SOLR-236:
----------------------------------

    Attachment: field-collapsing-extended-592129.patch

I've done some work on the field collapsing patch and made some additions and changes and posting this patch (against revision 592129) here for discussion.

- Added a collapse.facet = before|after parameter to control if faceting happens before or after collapsing.
- Changed collapse.max to collapse.threshold -- this value controls after which number of collapsible hits collapsing actually kicks in (collapse.max is still supported as an alias).
- Added a collapse.maxdocs parameter that limits the number of documents that CollapseFilter will process to create the filter DocSet. The intention of this is to be able to limit the time collapsing will take for very large result sets (obviously at the expense of accurate collapsing in those cases).
- Inverted the logic of the filter DocSet created by CollapseFilter to contain the documents that are to be collapsed instead of the ones that are to be kept. Without this collapse.maxdocs doesn't work.
- Added collapse.info.doc and collapse.info.count parameters to provide more control over what gets returned in the collapse_counts extra results.
- Made a minimal change to SolrIndexSearcher.getDocListC() to support passing both the filter and filterList parameters. In most cases this was already handled anyway.
- Did some general refactoring and added comments and a test case.

If somebody with deeper Solr/Lucene knowledge could review these changes it would be much appreciated.

Karsten


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Domingo Gómez García (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701862#action_12701862 ] 

Domingo Gómez García edited comment on SOLR-236 at 4/29/09 4:29 AM:
--------------------------------------------------------------------

I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. When I use collapse parameters  I always get permgen exceptions. How much memory could use collapse vs normal querys?

      was (Author: dgomezca):
    I made checkout on svn release-1.3.0 and applied SOLR-236_collapsing.patch. I have upgraded from 1.2 to 1.3.0 (patched) and I get a lot of permgen exceptions. Specially in calls from solrj.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793048#action_12793048 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

bq. I support your suggestion on splitting this issue into two. i.e make the core changes in a separate patch . That is the plan anyway.

The changes in the core that should be in a separate patch are:
# SolrIndexSearcher
# DocSetHitCollector
# DocSetAwareCollector

The above files where changes because of the following reasons:
# The getDocSet(...) methods in the SolrIndexSearcher did not allow me to specify a Lucene Collector, which I needed to get the uncollapsed docset and levering the Solr caches whilst doing that. I changed them so I was able to do that. 
# The patch also contains an extra getDocListAndSet(...) method that allows specifying a filter docset, which in the case of field collapsing is the collapsed docset. 

The QueryComponent has changed as well. The only reason these changes where made, was to support the psuedo distributed field-collapsing. Maybe for the distributed field collapsing a separate patch should created with this change as a start. Last but not least the SolrJ code. I think for these changes a separate patch should be created as well. Maybe for each patch a sub issue should be created in Jira. 

The rest of the files in the patch do not impact any core files and I think should remain in one patch. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Leon Messerschmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839545#action_12839545 ] 

Leon Messerschmidt commented on SOLR-236:
-----------------------------------------

The OutOfMemory problem affects both field-collapse-5.patch on Solr 1.4 and SOLR-236.patch on the trunk.

The root cause of the problem is DocSetScoreCollector that creates an array of float that is the size of the maxID document that matches the query.  If you have a large index (we have several million documents) and a document with a very large id is matched you may end up with a huge array (in our case several hundred MB).  Only a really small subset of the array is being used at any given time (especially if you're matching just a few documents with big doc ids).  

The implementation can rather use a sparse array or a map to keep track of scores.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792115#action_12792115 ] 

Shalin Shekhar Mangar commented on SOLR-236:
--------------------------------------------

{quote}
I'd define large scale for this in a couple of ways:
1. Lots of docs in the result set (10K+)
2. Lots of overall docs (100M+)
3. Lots of queries (> 10 QPS) 
{quote}

Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later.

What do you think?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679603#action_12679603 ] 

jove4015 edited comment on SOLR-236 at 3/6/09 6:13 AM:
------------------------------------------------------------

Help!!

We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly.

[Edit - It's Ivan's latest patch, #3]

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
	at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
	at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
	at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
	at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
	at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
	at org.apache.solr.search.CollapseFilter.<init>(CollapseFilter.java:139)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


It seems to happen randomly - there's no special request happening, nothing new added to the index, nothing.  We've made no configuration changes. The only thing that's happened is more documents have been added since then.  The schema is the same, we have perhaps 200000 more documents in the index now than we did when we first went live with it.

It was a 32-bit machine allocated 2GB of RAM for Java before.  We just upgraded it to 64-bit and increased the heap space to 3GB, and still it went down last night.  I'm at my wits end, I don't know what to do but this functionality has been live so long now it's going to be extremely painful to take it away.  Someone, please tell me if there's anything I can do to save this thing.

      was (Author: jove4015):
    Help!!

We've been using this patch in production for months now, and suddenly in the last 3 days it is crashing constantly.

Mar 6, 2009 5:23:50 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:701)
	at org.apache.solr.util.OpenBitSet.ensureCapacity(OpenBitSet.java:711)
	at org.apache.solr.util.OpenBitSet.expandingWordNum(OpenBitSet.java:280)
	at org.apache.solr.util.OpenBitSet.set(OpenBitSet.java:221)
	at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:217)
	at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:171)
	at org.apache.solr.search.CollapseFilter.<init>(CollapseFilter.java:139)
	at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:52)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)


It seems to happen randomly - there's no special request happening, nothing new added to the index, nothing.  We've made no configuration changes. The only thing that's happened is more documents have been added since then.  The schema is the same, we have perhaps 200000 more documents in the index now than we did when we first went live with it.

It was a 32-bit machine allocated 2GB of RAM for Java before.  We just upgraded it to 64-bit and increased the heap space to 3GB, and still it went down last night.  I'm at my wits end, I don't know what to do but this functionality has been live so long now it's going to be extremely painful to take it away.  Someone, please tell me if there's anything I can do to save this thing.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717110#action_12717110 ] 

Earwin Burrfoot commented on SOLR-236:
--------------------------------------

I have implemented collapsing on a high-volume project of mine in much less flexible, but more practical manner.

Part I. You have to guarantee that all documents having the same value of collapse-field are dropped into Lucene index as a sequential batch. That guarantees they get sequential docIds, and with some more work - that they all end up in the same segment.
Part II. When doing collection you always get docIds in sequential order, and thus, thanks to Part I you get the docs-to-be-collapsed already grouped by collapse-field, even before you drop the docs into PriorityQueue to sort them.

Cons:
You can only collapse on a single predetermined at index creation time field.
If one document changes, you have to reindex all docs that have the same collapse-field value, so it's best if you have either low update/add rates, or few documents sharing the same collapse-field value.

Pros:
The CPU and memory costs for collapsing compared to usual search are very close to zero and do not depend on index size/total docs found.
The same idea works with new Lucene per-segment collection and in distributed mode (sharded index).
Within collapsed group you can sort hits however you want, and select one that will represent the group for usual sort/paging.
The implementation is not brain-dead simple, but nears it.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721100#action_12721100 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

I have not found an online example yet, but I copied this config from the javadoc of the DistanceCalculatingComponent class and modified it. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Woodard (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777659#action_12777659 ] 

Thomas Woodard commented on SOLR-236:
-------------------------------------

I tried the build again, and you are right, it does work fine with the default search handler. I had been trying to get it working with our search handler, which is dismax. That still doesn't work. Here is the handler configuration, which works fine until collapsing is added.

{code:xml}
<requestHandler name="glsearch" class="solr.SearchHandler">
	<lst name="defaults">
		<str name="defType">dismax</str>
		<str name="qf">name^3 description^2 long_description^2 search_stars^1 search_directors^1 product_id^0.1</str>
		<str name="tie">0.1</str>
		<str name="facet">true</str>
		<str name="facet.field">stars</str>
		<str name="facet.field">directors</str>
		<str name="facet.field">keywords</str>
		<str name="facet.field">studio</str>
		<str name="facet.mincount">1</str>
	</lst>
</requestHandler>
{code}

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have created a new patch that has the following changes:
1) Non adajacent collasping with sorting on score also uses the Solr caches now. So now every field collapse searches are using the Solr caches properly. This was not the case in my previous versions of the patch. This improvement will make field collapsing perform better and reduce the query time for regular searches. The downside was, that in order to make this work I had to modify some methods in the SolrIndexSearcher. 

When sorting on score the non adjacent collapsing algorithm needs the score per document. The score is collected in a Lucene collector. The previous version of the patch uses the searcher.search(Query, Filter, Collector) method to collect the documents (as a DocSet) and scores, but by using this method the Solr caches were ignored.

The methods that return a DocSet in the SolrIndexSearcher do not offer the ability the specify your own collector. I changed that so you can specify your own collector and still benefit from the Solr caches. I did this in a non intrusive manner, so that nothing changes for existing code that uses the normal versions of these methods. 
{code}

   public DocSet getDocSet(Query query) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSet(query, collector);
   }

   public DocSet getDocSet(Query query, DocSetAwareCollector collector) throws IOException {
    ....
   }

  DocSet getPositiveDocSet(Query q) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getPositiveDocSet(q, collector);
   }

  DocSet getPositiveDocSet(Query q, DocSetAwareCollector collector) throws IOException {
    .....
   }

  public DocSet getDocSet(List<Query> queries) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSet(queries, collector);
   }

  public DocSet getDocSet(List<Query> queries, DocSetAwareCollector collector) throws IOException {
   .......
   }

  protected DocSet getDocSetNC(Query query, DocSet filter) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSetNC(query,  filter, collector);
   }

  protected DocSet getDocSetNC(Query query, DocSet filter, DocSetAwareCollector collector) throws IOException {
   .........
   }
{code}
I also made a DocSetAwareCollector that both DocSetCollector and DocSetScoreCollector implement.
2) The collapse.includeCollapsedDocs parameters has been removed. In order to include the collapsed documents the parameter collapse.includeCollapsedDocs.fl must be specified. collapse.includeCollapsedDocs.fl=* will include all fields of the collapsed documents and collapse.includeCollapsedDocs.fl=id,name will only include the id and name field of the collapsed documents.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754212#action_12754212 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

Hey Martijn,
Have you made any progress on making field collapsing distributed?
Oleg

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714442#action_12714442 ] 

Martijn van Groningen edited comment on SOLR-236 at 5/29/09 6:02 AM:
---------------------------------------------------------------------

Hi,

I have modified the latest patch of Thomas and made two performance improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList).

2) Slightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information more readable.

I'm very interested in other people's experiences with this patch and feedback on the patch itself. 

Cheers,

Martijn 


      was (Author: martijn):
    Hi,

I have modified the latest patch of Thomas and made two performance improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million documents. When collapsing on all documents and with no sorting specified (so sorting on score) the query time is around 130ms compared with the previous patch which is around 1.5 s. When I then add sorting on string field the query time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist instead of a docset. In the normal collapse method it keeps track of the most relevant documents, so the end result is the same, also creating a docList of 1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs (as far as I understand it) a completely sorted list of documents (docList).

2) Sightly improved facetation in combination with field collapsing, by reusing the uncollapsed docset that is created during the collapsing process (the previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing process itself and made the debug information easier readable.

I'm very interested in other people's experiences with this patch and feedback on the patch itself. 

Cheers,

Martijn 

  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

Hi Thomas, I  have fixed the problem and updated the patch. I was able to reproduce the bug on the Solr example dataset. The problem was not limited to field collapsing with sorting on a field alone. The problem was located in the NonAdjactentFieldCollapser in the doCollapse(...) method in this specific part:
{code}
      // dropoutId has a value smaller than the smallest value in the queue and therefore it was removed from the queue
      collapseDoc.priorityQueue.insertWithOverflow(currentId);

      // check if we have reached the collapse threshold, if so start counting collapsed documents
      if (++collapseDoc.totalCount > collapseTreshold) {
        collapseDoc.collapsedDocuments++;
        if (dropOutId != null) {
          addCollapsedDoc(currentId, currentValue);
        }
      }
{code}
Lets say that that the currentId has the most relevent field value and the collapseThreshold is met. When the currentId is added to the queue it stays there and another document id will be dropped out. In this situation a document that is the most relevant field value is added to the collapsed documents and it stays in the queue and therefore it will also be added to the normal results. 

I changed it to this.
{code}
      // dropoutId has a value smaller than the smallest value in the queue and therefore it was removed from the queue
      Integer dropOutId = (Integer) collapseDoc.priorityQueue.insertWithOverflow(currentId);

      // check if we have reached the collapse threshold, if so start counting collapsed documents
      if (++collapseDoc.totalCount > collapseTreshold) {
        collapseDoc.collapsedDocuments++;
        if (dropOutId != null) {
          addCollapsedDoc(dropOutId, currentValue);
        }
      }
{code}
Now only a document that will never and up in the final results is added to the collapsed documents (and not the current document that might be more relevant then other documents in the priority queue). The above code change fixes the bug in my test setups, can you also confirm that this fixes the issue on your side?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Traeger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716364#action_12716364 ] 

Thomas Traeger commented on SOLR-236:
-------------------------------------

ron, your approach should work, I just verified it on my Ubuntu 9.04 box. Here are my steps to a working example installation of solr 1.3.0 with collapsing enabled:

{noformat}
java -version
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

wget http://www.apache.org/dist/lucene/solr/1.3.0/apache-solr-1.3.0.tgz
tar xvzf apache-solr-1.3.0.tgz 
wget http://issues.apache.org/jira/secure/attachment/12407410/SOLR-236_collapsing.patch
cd apache-solr-1.3.0/
patch -p0 <../SOLR-236_collapsing.patch 
mv src/common/org/apache/solr/common/params/CollapseParams.java src/java/org/apache/solr/common/params/
ant example
cd example/
vi solr/conf/solrconfig.xml 
{noformat}

add the collapse component class definition:

{noformat}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" />
{noformat}

set the components in the standard requestHandler:

{noformat}
    <arr name="components">
      <str>collapse</str>
    </arr>
{noformat}

start jetty
{noformat}
java -jar start.jar
{noformat}
add example docs
{noformat}
cd example/exampledocs
sh post.sh *.xml
{noformat}
and open http://localhost:8983/solr/select/?q=*:*&collapse.field=cat in your browser.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600281#action_12600281 ] 

Otis Gospodnetic commented on SOLR-236:
---------------------------------------

It's amazing this issue/patch has so many votes and watchers, yet it's stuck...
Ryan, Yonik, Emmanuel, Doug, Charles, Karsten

I think Bojan is onto something here.  Isn't the ability to *chain QueryComponent (QC) and CollapseComponent (CC) essential*?

I'm looking at  field_collapsing_dsteigerwald.diff  and see that the *CC.prepare method there is identical to the QC.prepare method*, while process methods are different.  Could we solve this particular copy/paste situation by *making CC extend QC and simply override the process method*?

As for chaining, could CC take the same approach as the MLT Component, which simply does it's thing to find "more like this" docs and stuffs them into the "moreLikeThis" element in the response?

I could be misunderstanding something, so please correct me if I'm wrong.  I'd love to get this one in 1.3 -- it's been waiting in JIRA for too long. :)


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

Sorry, my last post was buggy. Here is the correct one. There is no more exception now.
About tokens, if any token matches within the field it will collapse.
When I start implementing collapsing, my need was to to group documents having exact identical field.

I believe that faceting has identical behavior. Lookt at "Graphic card" as example:
http://localhost:8983/solr/select/?q=cat:graphic%20card&version=2.2&start=0&rows=10&indent=on&facet=true&facet.field=cat

I will try to maintain the wiki page.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Matthias Epheser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621470#action_12621470 ] 

Matthias Epheser commented on SOLR-236:
---------------------------------------

I just tried to apply the last patch and ran into 2 issues:

First: 

The new getDocListAndSet(Query query, List<Query>..) method in SolrIndexSearcher calls the getDocListC(..) method using the old signature. I changed the call to the new signature and it worked very well:

DocListAndSet ret = new DocListAndSet();
QueryResult queryResult = new QueryResult();
queryResult.setDocListAndSet(ret);
queryResult.setPartialResults(false);
QueryCommand queryCommand = new QueryCommand();
queryCommand.setQuery(query);
queryCommand.setFilterList(filterList);
queryCommand.setFilter(docSet);
queryCommand.setSort(lsort);
queryCommand.setOffset(offset);
queryCommand.setLen(len);
queryCommand.setFlags(flags |= GET_DOCSET);
getDocListC(queryResult, queryCommand);


Second:

After adding more docs (~3000), I got an Exception in SolrIndexSearcher at line ~1300:
qr.setDocSet(filter == null ? qDocSet : qDocSet.intersection(filter));

As the NegotiatedDocSet doesn't implement the iterator() function, this call lead to an Unsupported Operation exception. I just naively tried to implement this funtion using "return source.iterator()". Works fine for me.


As the first issue is very clear, I wanted to check my approach for the second one before I post a patch. Maybe there are some side effects that I missed.  


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Dmitry Lihachev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689028#action_12689028 ] 

Dmitry Lihachev commented on SOLR-236:
--------------------------------------

When I add a Filter Query (fq param) to my query I get an exception "Either filter or filterList may be set in the QueryCommand, but not both."

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yao Ge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840470#action_12840470 ] 

Yao Ge commented on SOLR-236:
-----------------------------

I just applied the latest patch to trunk and I don't quite understand how the "numFound" in the response list is computed. With rows=10&collapse.threshold=1, I got numFound=11, with rows=10&collapse.threshold=2, I got numFound=22.
I both cases the actual doc in the list is 10. Why is the numFound reported this way?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749959#action_12749959 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Thomas, I agree that in your situation this feature is very handy. Assuming that you want to return the whole document (with all fields) and you have groups of reasonable sizes then this increases your response time dramatically.  What I think would be a better approach is to only return the fields you want to use for your calculation. Lets say an average price per group. So instead of returning 10 fields per group (let say 7000 documents) you will only return one and that will save you a lot response time. 
What do you think about this approach?

I also find the Ajax response solution, that Darrell describes is a good way to go. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791977#action_12791977 ] 

Grant Ingersoll commented on SOLR-236:
--------------------------------------

Is there a typo on the http://wiki.apache.org/solr/FieldCollapsing page in regards to the outputs?  There are two different output results, but the URL for the examples are the same.  See http://wiki.apache.org/solr/FieldCollapsing#Examples.  I think the second one is intended to show a collapse count for fields?

Also, I'm not sold on having separate collapse elements from the actual response, but I know other things do it too, so it isn't a huge deal), but the list of "parallel arrays" that one needs to traverse in order to render results is growing (highlighter, MLT, now Field Collapsing.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Tracy Flynn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538332 ] 

Tracy Flynn commented on SOLR-236:
----------------------------------

Ryan,

Thanks for the quick reply and clarification. I'll follow your suggestion as to where to apply and try the patch.

I'll be eagerly waiting for the updated trunk.

Regards,

Tracy



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Stephen Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658461#action_12658461 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

Yes!  It does work.  Thank you both so much!  It's been running for 5 days now without a hiccup.  This is going into production use now (we'll be monitoring), they simply can't wait for the functionality.  From here it looks like if you get faceting tidied up and some docs written, they should be including this soon!

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750582#action_12750582 ] 

Martijn van Groningen edited comment on SOLR-236 at 9/2/09 11:18 AM:
---------------------------------------------------------------------

Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. 

Currently the collapse response is like this:
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="doc">
            <int name="233238">1</int>
        </lst>
        <lst name="count">
            <int name="melkweg">1</int>
        </lst>
</lst>
{code}

I think a response format like the following would be more ....
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="results">
            <lst name="233238">
                 <str name="fieldValue">melkweg</str>
                 <int name="collapseCount">2</int>
                 <lst name="collapsedValues">
                     <str name="price">10.99, "1.999,99"</str>
                     <str name="name">adapter, laptop</str>
                 </lst>
            </lst>
        </lst>
</lst>
{code}
As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request.
What do you think about this new result format? 

      was (Author: martijn):
    Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. 

Currently the collapse response is like this:
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="doc">
            <int name="233238">1</int>
        </lst>
        <lst name="count">
            <int name="melkweg">1</int>
        </lst>
    </lst>
{code}

I think a response format like the following would be more ....
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="">
            <lst name="233238">
                 <str name="fieldValue">melkweg</str>
                 <int name="collapseCount">2</int>
                 <lst name="collapsedValues">
                     <str name="price">10.99, "1.999,99"</str>
                     <str name="name">adapter, laptop</str>
                 </lst>
        </lst>
</lst>
{code}
As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request.
What do you think about this new result format? 
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841752#action_12841752 ] 

Peter Karich commented on SOLR-236:
-----------------------------------

> Shouldn't the float array in DocSetScoreCollector be changed to a Map?

hmmh, maybe I expressed myself a bit weird: I already changed this all to a Map (a SortedMap) ... 
I started this change in DocSetScoreCollector and changed all the other occurances of the float array (otherwise I would have to copy the entire map)

> > I think the compare method should NOT be called if no docs are in the scores array ... ?

> I would expect that every docId has a score.

Yes, me too. So I expect there is somewhere a bug. But as I sayd this breaks only one test (collapse with faceting before). It could be even a but in the testcase though.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Thomas Woodard (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778443#action_12778443 ] 

Thomas Woodard commented on SOLR-236:
-------------------------------------

And this morning, without changing anything, it is working fine. I don't know what happened on Friday, but the changes I made then must have fixed it without showing up for some reason. In any case, thank you for the assistance.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Description: 
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 3 new query parameters (SolrParams):
"collapse.field" to choose the field used to group results
"collapse.type" normal (default value) or adjacent
"collapse.max" to select how many continuous results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases

Two patches:
- "field_collapsing.patch" for current development version (1.2)
- "field_collapsing_1.1.0.patch" for Solr-1.1.0


P.S.: Feedback and misspelling correction are welcome ;-)

  was:
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 4 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.type" normal (default value) or adjacent
"collapse.max" to select how many continuous results are allowed before collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases



> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ron Veenstra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716128#action_12716128 ] 

Ron Veenstra commented on SOLR-236:
-----------------------------------

I require assistance.  I've installed a fresh Solr (1.3.0), and all appears/operates well.  I then patch using SOLR-236_collapsing.patch (the last patch i saw claimed to work with 1.3.0), without error.  I then add to solrconfig.xml the following (per: http://wiki.apache.org/solr/FieldCollapsing) :

  <searchComponent name="collapse"     class="org.apache.solr.handler.component.CollapseComponent" />

Upon restart, I get a long configuration error, which seems to hinge on:

HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: <abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml ------------------------------------------------------------- org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)

[the full error can be included if desired.]

I've verified that the CollapseComponent file exists in the proper place.
I've moved CollapseParams as required, (move CollapseParams.java from common/org/apache/solr/common/params to java/org/apache/solr/common/params/ )
I've tried multiple iterations of the patch (on fresh installs), all with the same issue.

Are there additional steps, patches, or configurations that are required?
Is this a known issue?
Any help is very much appreciated.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "ttdi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793898#action_12793898 ] 

ttdi commented on SOLR-236:
---------------------------

hi,Martijn van Groningen experts,
    when i use http://localhost:8080/search/?page=1
this can collapse the page=1 result,but when i use http://localhost:8080/search/?page=2
it can only collapse the page=2 result, not collapse all record?
i want collapse the all record use pagination ,how can i do it?
Thanks!

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754747#action_12754747 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Thomas. I tried to reproduce something similar here, but I did run into the problems you described. Can you tell me what the fieldtypes are for your sort field and collapse field?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Chad Kouse (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789550#action_12789550 ] 

Chad Kouse commented on SOLR-236:
---------------------------------

Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the write class.... Sorry I didn't note what failed....

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Paul Nelson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753335#action_12753335 ] 

Paul Nelson edited comment on SOLR-236 at 9/9/09 5:07 PM:
----------------------------------------------------------

Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are <0.5sec without collapsing, they are 3-4sec with collapsing. Meanwhile, collapse using *:* or other simple queries come back in <0.5sec - so it appears to be primarily a query-complexity issue.

I'm wondering if the filter cache (or some other cache) might be able to help with this situation?

      was (Author: pnelsoncomposer):
    Hey All:  Just upgraded to 1.4 to get the new patch (many thanks, Martijn). The new algorithm appears to be sensitive to the size and complexity of the query (rather than simply the count of documents) - should this be the case? Unfortunately, we have rather large and complex queries with dozens of terms and several phrases, and while these queries are <0.5sec without collapsing, they are 3-4sec with collapsing.

I'm wondering if the filter cache (or some other cache) might be able to help with this situation?
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797794#action_12797794 ] 

Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM:
--------------------------------------------------------------------

bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with <searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent"/>. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the patch. 

Martijn

      was (Author: martijn):
    bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with <searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent"/>.
Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the patch. 

Martijn
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-236:
---------------------------------------

    Attachment: SOLR-236.patch

Patch in sync with trunk.

# CollapseComponent is PluginInfoInitialized. Removed changes to SolrConfig. Note, the collapseCollectorFactories array and the separate fieldCollapsing element has been removed from configuration.  this patch has the following configuration:
{code:xml}
<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent">
    <collapseCollectorFactory name="groupDocumentsCounts" class="solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupFieldValue" class="solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupDocumentsFields" class="solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory" />

    <collapseCollectorFactory name="groupAggregatedData" class="org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory">
        <lst name="aggregateFunctions">
            <str name="sum">org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction</str>
            <str name="avg">org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction</str>
            <str name="min">org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction</str>
            <str name="max">org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction</str>
        </lst>
    </collapseCollectorFactory>

   <fieldCollapseCache
      class="solr.FastLRUCache"
      size="512"
      initialSize="512"
      autowarmCount="128"/>
  </searchComponent>
{code}

# I couldn't find where the fieldCollapseCache was being regenerated. It seems it is not being thrown away after commits? I have changed it to be re-created on newSearcher event.
# Removed changes to JettySolrRunner,CoreContainer and SolrDispatchFilter for the distributed test case. We will refactor it to use BaseDistributedSearchTestCase (not implemented yet)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Iván de Prado (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647335#action_12647335 ] 

Iván de Prado commented on SOLR-236:
------------------------------------

I attached a patch named collapsing-patch-to-1.3.0-ivan.patch. The patch applies to Solr 1.3.0.

Karsten commented in the comment "Karsten Sperling - 06/Nov/07 02:06 PM":
{quote}
Inverted the logic of the filter DocSet created by CollapseFilter to contain the documents that are to be collapsed instead of the ones that are to be kept. Without this collapse.maxdocs doesn't work.
{quote}

I found that this way of doing consumes a lot of memory, even if your query is bounded to a few number of documents. And I found that there is not advantage on using collapse.maxdocs if you don't speed up queries and reduces the amount of needed memory. 

So, I decided to revert the Karsten change in order to make field collapsing faster and less resources consuming when querying for smaller datasets.

WARNING: This patch changes the semantic of collapse.maxdocs. Before this patch, the collapse.maxdocs was used just for reduce the number of docs cheked for grouping, but presenting the rest of documents that were not grouped in the result. 

With current patch, only documents that were examinated for grouping can appear in the result. This semantic have two benefits:
- The amount of resources can be controled per each query
- Not ungrouped content is presented.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Michael Gundlach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775192#action_12775192 ] 

Michael Gundlach commented on SOLR-236:
---------------------------------------

I've found an NPE that occurs when performing quasi-distributed field collapsing.

My company only has one use case for field collapsing: collapsing on email address.  Our index is spread across multiple cores.  We found that if we shard by email address, so that a given all documents with a given email address are guaranteed to appear on the same core, then we can do distributed field collapsing.

We add &collapse.field=email and &shards=core1,core2,... to a regular query.  Each core collapses on email and sends the results back to the requestor.  Since no emails appear on more than one core, we've accomplished distributed search.  We do lose the <collapse_count> section, but that's not needed for our purpose -- we just need an accurate total document count, and to have no more than one document for a given email address in the results.

Unfortunately, this throws an NPE when searching on a tokenized field.  Searching string fields is fine.  I don't understand exactly why the NPE appears, but I did bandaid over it by checking explicitly for nulls at the appropriate line in the code.  No more NPE.

There's a downside, which is that if we attempt to collapse on a field other than email -- one which has documents appearing in multiple cores -- the results are buggy: the first search returns few documents, and the number of documents actually displayed don't always match the "numFound" value.  Then upon refresh we get what we think is the correct numFound, and the correct list of documents.  This doesn't bother me too much, as you're guaranteed to get incorrect answers from the collapse code anyway when collapsing on a field that you didn't use as your key for sharding.

In the spirit of Yonik's law of patches, I have made two imperfect patches attempting to contribute the fix, or at least point out the error:

1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, and created a patch file.  The resultant patch file looks very different from the latest SOLR-236 patchfile, so I assume I did something wrong.

2. I pulled trunk, made my 2 line change, and created another patch file.  This file is tiny but of course is missing all of the field collapsing changes.

Would you like me to post either of these patchfiles to this issue?  Or is it sufficient to just tell you that the NPE occured in QueryComponent.java on line 556? ("rb._responseDocs.set(sdoc.positionInResponse, doc);" where sdoc was null.)  Perhaps my use case is extraordinary enough that you're happy leaving the NPE in place and telling other users to not do what I'm doing?

Thanks!
Michael

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: SOLR-236.patch

I agree! I've updated the patch that adds a check if a field is indexed. If not an exception is thrown.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Abdul Chaudhry (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751243#action_12751243 ] 

Abdul Chaudhry commented on SOLR-236:
-------------------------------------

I have some ideas for performance improvements.

I noticed that the code fetches the field cache twice, once for the collapse and then for the response object, assuming you asked for the info count in the response.

That seems expensive, especially for real-time content.

I think its better to use FieldCache.StringIndex instead of returning a large string array and keep it around for the collapse and the response object.

I changed the code so that I keep the cache around like so

  /**
   * Keep the field cached for the collapsed fields for the response object as well
   */
  private FieldCache.StringIndex collapseIndex;


when collapsing , you can get the current value using something like this and remove the code passing the array

      int currentId = i.nextDoc();
      String currentValue = collapseIndex.lookup[collapseIndex.order[currentId]];

when building the response for the info count, you can reference the same cache like so:-

          if (collapseInfoCount) {
            resCount.add(collapseFieldType.indexedToReadable(
              collapseIndex.lookup[collapseIndex.order[id]]), count);
          }

I also added timing for the cache access as it could be slow if you are doing a lot of updates

I have added code for displaying selected fields for the duplicates but its difficult to submit . I hope this gets committed as its hard to sumbit  a patch as its not in svn and I cannot submit a patch to a patch to a patch .. you get the idea.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835258#action_12835258 ] 

Peter Karich edited comment on SOLR-236 at 2/18/10 4:07 PM:
------------------------------------------------------------

Trying the latest patch from 1th Feb 2010. It compiles against solr-2010-02-13 from nightly build dir, but does not work. If I query 

http://server/solr-app/select?q=*:*&collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(

      was (Author: peathal):
    Trying the latest patch from 1th Feb 2010 compiles against solr-2010-02-13 from nightly build but does not work. If I query 

http://server/solr-app/select?q=*:*&collapse.field=myfield

it fails with: 

{noformat} 

HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.schema.FieldType.toExternal(FieldType.java:329) at 
org.apache.solr.schema.FieldType.storedToReadable(FieldType.java:348) at 
org.apache.solr.search.fieldcollapse.collector.AbstractCollapseCollector.getCollapseGroupResult(AbstractCollapseCollector.java:58) at 
org.apache.solr.search.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory$DocumentCountCollapseCollector.getResult(DocumentGroupCountCollapseCollectorFactory.ja
va:84) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.getCollapseInfo(AbstractDocumentCollapser.java:193) at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:192) at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at
...
 {noformat} 


I only need the OutOfMemory problem solved ... :-(
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Peter Karich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Karich updated SOLR-236:
------------------------------

    Attachment: NonAdjacentDocumentCollapserTest.java
                NonAdjacentDocumentCollapser.java
                DocSetScoreCollector.java

It seems to me that the provides changes are necessary to make the OutOfMemory exception gone. Please apply the files with caution, because I made the changes from an old patch (from Nov 2009)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing.patch

Corrects a bug on the previous version when using a value greater than 1 as collapse.max parameter.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792518#action_12792518 ] 

Mark Miller edited comment on SOLR-236 at 12/18/09 4:12 PM:
------------------------------------------------------------

bq. I very much disagree with a policy blocking non-production-ready code from being in source control

Just to be clear, there is no such policy that I've seen - each decision just comes down to consensus. And as far as I know, our branch policy is pretty much "anything goes" - trunk is very different than svn. Anyone (anyone with access to svn that is) can play around with a branch for anything if they want.


I agree with your thoughts on a branch - if the argument is, we want it to be easier for devs to check out and work on this, or for users to checkout and build this without applying patches, why not just make a branch? Merging is annoying but not difficult - I've been doing plenty of branch merging lately, and while its not glorious work, modern tools make it more of a grind than a challenge.

      was (Author: markrmiller@gmail.com):
    bq. I very much disagree with a policy blocking non-production-ready code from being in source control

Just to be clear, there is no such policy that I've seen - each decision just comes down to consensus. And as far as I know, our branch policy is pretty much "anything goes" - trunk is very different than svn. Anyone can play around with a branch for anything if they want.


I agree with your thoughts on a branch - if the argument is, we want it to be easier for devs to check out and work on this, or for users to checkout and build this without applying patches, why not just make a branch? Merging is annoying but not difficult - I've been doing plenty of branch merging lately, and while its not glorious work, modern tools make it more of a grind than a challenge.
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716441#action_12716441 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Stephan, when I was doing performance tests on the latest patch for doing normal collapsing (not adjacent collapsing), I found that there was a significant performance improvement during field collapsing compared to the old patch. This applies for both specifying sorting and not specifying sorting in the request. If you have other questions / comments about the latest patch just ask. 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-236:
----------------------------

    Comment: was deleted

(was: hi,experts,
 thanks for the great work!
 now i download solr1.4 from http://apache.freelamp.com/lucene/solr/1.4.0/apache-solr-1.4.0.zip
and i path this patch:    SOLR-236.patch 2009-12-18 10:16 AM Shalin Shekhar Mangar 
like this:
G:\doc\apache-solr-1.4.0>patch.exe -p0 < SOLR-236.patch

it will show some error,and this patch( SOLR-236.patch 2009-12-18 10:16 AM )don't support solr1.4 ?


and the result is:
patching file src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml
patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml
patching file src/test/test-files/solr/conf/solrconfig.xml
patching file src/test/test-files/fieldcollapse/testResponse.xml
can't find file to patch at input line 787
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|
|Property changes on: src/test/test-files/fieldcollapse/testResponse.xml
|___________________________________________________________________
|Added: svn:keywords
|   + Date Author Id Revision HeadURL
|Added: svn:eol-style
|   + native
|
|Index: src/test/org/apache/solr/BaseDistributedSearchTestCase.java
|===================================================================
|--- src/test/org/apache/solr/BaseDistributedSearchTestCase.java(revision 891214)
|+++ src/test/org/apache/solr/BaseDistributedSearchTestCase.java(working copy)
--------------------------
File to patch: SOLR-236.patch
S: No such file or directory
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored
patching file src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTest.java
patching file src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java
patching file src/test/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapserTest.java
patching file src/test/org/apache/solr/search/fieldcollapse/AdjacentCollapserTest.java
patching file src/test/org/apache/solr/handler/component/CollapseComponentTest.java
patching file src/test/org/apache/solr/client/solrj/response/FieldCollapseResponseTest.java
patching file src/java/org/apache/solr/search/DocSetAwareCollector.java
patching file src/java/org/apache/solr/search/fieldcollapse/CollapseGroup.java
patching file src/java/org/apache/solr/search/fieldcollapse/DocumentCollapseResult.java
patching file src/java/org/apache/solr/search/fieldcollapse/DocumentCollapser.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollectorFactory.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGroupCountCollapseCollectorFactory.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AverageFunction.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MinFunction.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/SumFunction.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MaxFunction.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AggregateFunction.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseContext.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/DocumentFieldsCollapseCollectorFactory.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/AggregateCollapseCollectorFactory.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollector.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueCountCollapseCollectorFactory.java
patching file src/java/org/apache/solr/search/fieldcollapse/collector/AbstractCollapseCollector.java
patching file src/java/org/apache/solr/search/fieldcollapse/AbstractDocumentCollapser.java
patching file src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java
patching file src/java/org/apache/solr/search/fieldcollapse/AdjacentDocumentCollapser.java
patching file src/java/org/apache/solr/search/fieldcollapse/util/Counter.java
patching file src/java/org/apache/solr/search/SolrIndexSearcher.java
patching file src/java/org/apache/solr/search/DocSetHitCollector.java
patching file src/java/org/apache/solr/handler/component/CollapseComponent.java
patching file src/java/org/apache/solr/handler/component/QueryComponent.java
Hunk #5 succeeded at 521 with fuzz 2.
Hunk #6 succeeded at 562 (offset -5 lines).
patching file src/java/org/apache/solr/util/DocSetScoreCollector.java
patching file src/common/org/apache/solr/common/params/CollapseParams.java
patching file src/solrj/org/apache/solr/client/solrj/SolrQuery.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 50.
Hunk #3 FAILED at 76.
Hunk #4 FAILED at 148.
Hunk #5 FAILED at 197.
Hunk #6 succeeded at 510 (offset -155 lines).
Hunk #7 succeeded at 566 (offset -155 lines).
5 out of 7 hunks FAILED -- saving rejects to file src/solrj/org/apache/solr/client/solrj/SolrQuery.java.rej
patching file src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
Hunk #1 FAILED at 47.
Hunk #2 FAILED at 63.
Hunk #3 succeeded at 122 with fuzz 2 (offset -8 lines).
Hunk #4 succeeded at 320 with fuzz 2 (offset 17 lines).
2 out of 4 hunks FAILED -- saving rejects to file src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java.rej
patching file src/solrj/org/apache/solr/client/solrj/response/FieldCollapseResponse.java

and in src/solrj/org/apache/solr/client/solrj/SolrQuery.java.rej 

***************
*** 17,28 ****
  
  package org.apache.solr.client.solrj;
  
- import org.apache.solr.common.params.CommonParams;
- import org.apache.solr.common.params.FacetParams;
- import org.apache.solr.common.params.HighlightParams;
- import org.apache.solr.common.params.ModifiableSolrParams;
- import org.apache.solr.common.params.StatsParams;
- import org.apache.solr.common.params.TermsParams;
  
  import java.util.regex.Pattern;
  
--- 17,23 ----
  
  package org.apache.solr.client.solrj;
  
+ import org.apache.solr.common.params.*;
  
  import java.util.regex.Pattern;
  
***************
*** 55,62 ****
      this.set(CommonParams.Q, q);
    }
  
-   /** enable/disable terms.  
-    * 
     * @param b flag to indicate terms should be enabled. <br /> if b==false, removes all other terms parameters
     * @return Current reference (<i>this</i>)
     */
--- 50,57 ----
      this.set(CommonParams.Q, q);
    }
  
+   /** enable/disable terms.
+    *
     * @param b flag to indicate terms should be enabled. <br /> if b==false, removes all other terms parameters
     * @return Current reference (<i>this</i>)
     */
***************
*** 81,150 ****
      }
      return this;
    }
-   
    public boolean getTerms() {
      return this.getBool(TermsParams.TERMS, false);
    }
-   
    public SolrQuery addTermsField(String field) {
      this.add(TermsParams.TERMS_FIELD, field);
      return this;
    }
-   
    public String[] getTermsFields() {
      return this.getParams(TermsParams.TERMS_FIELD);
    }
-   
    public SolrQuery setTermsLower(String lower) {
      this.set(TermsParams.TERMS_LOWER, lower);
      return this;
    }
-   
    public String getTermsLower() {
      return this.get(TermsParams.TERMS_LOWER, "");
    }
-   
    public SolrQuery setTermsUpper(String upper) {
      this.set(TermsParams.TERMS_UPPER, upper);
      return this;
    }
-   
    public String getTermsUpper() {
      return this.get(TermsParams.TERMS_UPPER, "");
    }
-   
    public SolrQuery setTermsUpperInclusive(boolean b) {
      this.set(TermsParams.TERMS_UPPER_INCLUSIVE, b);
      return this;
    }
-   
    public boolean getTermsUpperInclusive() {
      return this.getBool(TermsParams.TERMS_UPPER_INCLUSIVE, false);
    }
-   
    public SolrQuery setTermsLowerInclusive(boolean b) {
      this.set(TermsParams.TERMS_LOWER_INCLUSIVE, b);
      return this;
    }
-   
    public boolean getTermsLowerInclusive() {
      return this.getBool(TermsParams.TERMS_LOWER_INCLUSIVE, true);
    }
-  
    public SolrQuery setTermsLimit(int limit) {
      this.set(TermsParams.TERMS_LIMIT, limit);
      return this;
    }
-   
    public int getTermsLimit() {
      return this.getInt(TermsParams.TERMS_LIMIT, 10);
    }
-  
    public SolrQuery setTermsMinCount(int cnt) {
      this.set(TermsParams.TERMS_MINCOUNT, cnt);
      return this;
    }
-   
    public int getTermsMinCount() {
      return this.getInt(TermsParams.TERMS_MINCOUNT, 1);
    }
--- 76,145 ----
      }
      return this;
    }
+ 
    public boolean getTerms() {
      return this.getBool(TermsParams.TERMS, false);
    }
+ 
    public SolrQuery addTermsField(String field) {
      this.add(TermsParams.TERMS_FIELD, field);
      return this;
    }
+ 
    public String[] getTermsFields() {
      return this.getParams(TermsParams.TERMS_FIELD);
    }
+ 
    public SolrQuery setTermsLower(String lower) {
      this.set(TermsParams.TERMS_LOWER, lower);
      return this;
    }
+ 
    public String getTermsLower() {
      return this.get(TermsParams.TERMS_LOWER, "");
    }
+ 
    public SolrQuery setTermsUpper(String upper) {
      this.set(TermsParams.TERMS_UPPER, upper);
      return this;
    }
+ 
    public String getTermsUpper() {
      return this.get(TermsParams.TERMS_UPPER, "");
    }
+ 
    public SolrQuery setTermsUpperInclusive(boolean b) {
      this.set(TermsParams.TERMS_UPPER_INCLUSIVE, b);
      return this;
    }
+ 
    public boolean getTermsUpperInclusive() {
      return this.getBool(TermsParams.TERMS_UPPER_INCLUSIVE, false);
    }
+ 
    public SolrQuery setTermsLowerInclusive(boolean b) {
      this.set(TermsParams.TERMS_LOWER_INCLUSIVE, b);
      return this;
    }
+ 
    public boolean getTermsLowerInclusive() {
      return this.getBool(TermsParams.TERMS_LOWER_INCLUSIVE, true);
    }
+ 
    public SolrQuery setTermsLimit(int limit) {
      this.set(TermsParams.TERMS_LIMIT, limit);
      return this;
    }
+ 
    public int getTermsLimit() {
      return this.getInt(TermsParams.TERMS_LIMIT, 10);
    }
+ 
    public SolrQuery setTermsMinCount(int cnt) {
      this.set(TermsParams.TERMS_MINCOUNT, cnt);
      return this;
    }
+ 
    public int getTermsMinCount() {
      return this.getInt(TermsParams.TERMS_MINCOUNT, 1);
    }
***************
*** 153,186 ****
      this.set(TermsParams.TERMS_MAXCOUNT, cnt);
      return this;
    }
-   
    public int getTermsMaxCount() {
      return this.getInt(TermsParams.TERMS_MAXCOUNT, -1);
    }
-   
    public SolrQuery setTermsPrefix(String prefix) {
      this.set(TermsParams.TERMS_PREFIX_STR, prefix);
      return this;
    }
-   
    public String getTermsPrefix() {
      return this.get(TermsParams.TERMS_PREFIX_STR, "");
    }
-   
    public SolrQuery setTermsRaw(boolean b) {
      this.set(TermsParams.TERMS_RAW, b);
      return this;
    }
-   
    public boolean getTermsRaw() {
      return this.getBool(TermsParams.TERMS_RAW, false);
    }
-  
    public SolrQuery setTermsSortString(String type) {
      this.set(TermsParams.TERMS_SORT, type);
      return this;
    }
-   
    public String getTermsSortString() {
      return this.get(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_COUNT);
    }
--- 148,181 ----
      this.set(TermsParams.TERMS_MAXCOUNT, cnt);
      return this;
    }
+ 
    public int getTermsMaxCount() {
      return this.getInt(TermsParams.TERMS_MAXCOUNT, -1);
    }
+ 
    public SolrQuery setTermsPrefix(String prefix) {
      this.set(TermsParams.TERMS_PREFIX_STR, prefix);
      return this;
    }
+ 
    public String getTermsPrefix() {
      return this.get(TermsParams.TERMS_PREFIX_STR, "");
    }
+ 
    public SolrQuery setTermsRaw(boolean b) {
      this.set(TermsParams.TERMS_RAW, b);
      return this;
    }
+ 
    public boolean getTermsRaw() {
      return this.getBool(TermsParams.TERMS_RAW, false);
    }
+ 
    public SolrQuery setTermsSortString(String type) {
      this.set(TermsParams.TERMS_SORT, type);
      return this;
    }
+ 
    public String getTermsSortString() {
      return this.get(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_COUNT);
    }
***************
*** 202,208 ****
    public String[] getTermsRegexFlags()  {
      return this.getParams(TermsParams.TERMS_REGEXP_FLAG);
    }
-      
    /** Add field(s) for facet computation.
     * 
     * @param fields Array of field names from the IndexSchema
--- 197,203 ----
    public String[] getTermsRegexFlags()  {
      return this.getParams(TermsParams.TERMS_REGEXP_FLAG);
    }
+ 
    /** Add field(s) for facet computation.
     * 
     * @param fields Array of field names from the IndexSchema



in src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java.rej:

***************
*** 47,52 ****
    private NamedList<Object> _spellInfo = null;
    private NamedList<Object> _statsInfo = null;
    private NamedList<Object> _termsInfo = null;
  
    // Facet stuff
    private Map<String,Integer> _facetQuery = null;
--- 47,53 ----
    private NamedList<Object> _spellInfo = null;
    private NamedList<Object> _statsInfo = null;
    private NamedList<Object> _termsInfo = null;
+   private NamedList<Object> _collapseInfo = null;
  
    // Facet stuff
    private Map<String,Integer> _facetQuery = null;
***************
*** 62,68 ****
  
    // Terms Response
    private TermsResponse _termsResponse = null;
-   
    // Field stats Response
    private Map<String,FieldStatsInfo> _fieldStatsInfo = null;
    
--- 63,72 ----
  
    // Terms Response
    private TermsResponse _termsResponse = null;
+ 
+   // Field collapse response
+   private FieldCollapseResponse _fieldCollapseResponse = null;  
+ 
    // Field stats Response
    private Map<String,FieldStatsInfo> _fieldStatsInfo = null;
    

)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793090#action_12793090 ] 

Noble Paul commented on SOLR-236:
---------------------------------

We need to open a separate issue for the core related changes.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841558#action_12841558 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Shouldn't the float array in DocSetScoreCollector be changed to a Map? Because that is actually being cached and requires the most memory. The float array in the NonAdjacentDocumentCollapser.PredefinedScorer isn't being cached. Though changing this to a Map can be an improvement. 

bq. I think the compare method should NOT be called if no docs are in the scores array ... ?
I would expect that every docId has a score.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

Right, It's more useful.

This new version includes the result as you expect it.

You should add the following constraint on the wiki: The collapsing field must be un-tokenized.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: collapse_field.patch, collapse_field.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing.patch, field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501582 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

Oh I see... the modified sort is *just* to build the filter.

The building-the-filter part is a problem though... asking for *all* matching docs in sorted order isn't that scalable.
If we get the interface right though, more efficient implementations can follow.
For that reason, it might be good for implementatin details like "collapseCache" to be private.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment:     (was: field_collapsing.patch)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

New release:
- Fieldcollapsing added on DisMaxRequestHandler
- Types are correctly handled on collapsed field

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750582#action_12750582 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. 

Currently the collapse response is like this:
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="doc">
            <int name="233238">1</int>
        </lst>
        <lst name="count">
            <int name="melkweg">1</int>
        </lst>
    </lst>
{code}

I think a response format like the following would be more ....
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="">
            <lst name="233238">
                 <str name="fieldValue">melkweg</str>
                 <int name="collapseCount">2</int>
                 <lst name="collapsedValues">
                     <str name="price">10.99, "1.999,99"</str>
                     <str name="name">adapter, laptop</str>
                 </lst>
        </lst>
</lst>
{code}
As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request.
What do you think about this new result format? 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792514#action_12792514 ] 

David Smiley commented on SOLR-236:
-----------------------------------

I've been watching this thread forever without saying anything but want to offer my two cents and I'll but out.

I very much disagree with a policy blocking non-production-ready code from being in source control.  All code starts off this way and it would be quite a shame not to leverage the advantages of source control simply because it isn't ready yet.  If people are uncomfortable with it being in trunk then _simply_ use a branch.  Of course, how simple "simple" is depends on one's comfort with source control and the particular source control technology used and tools to help you (e.g. IDEs).  By the way, git makes "feature branches" (which is what this would be) easy to manage and integrates bidirectionally with subversion.  If you're not comfortable with branching because you're not familiar with it then you need to learn.  By "you" I don't mean anyone in particular, I mean all professional software developers.  Source control and branching are tools of our trade.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791972#action_12791972 ] 

Grant Ingersoll commented on SOLR-236:
--------------------------------------

I'd define large scale for this in a couple of ways:
1. Lots of docs in the result set (10K+)
2. Lots of overall docs (100M+)
3. Lots of queries (> 10 QPS)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501583 ] 

Emmanuel Keller commented on SOLR-236:
--------------------------------------

Correct, except that collapse result is only used as filter to the final result to hide collapsed documents.

P.S.: Sorry, if my answers are a little short, I am not perfectly fluent in english.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749464#action_12749464 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Thomas, currently both collapsing algorithms do not store the the ids of the collapsed documents. 
In order to have this functionality I think the following has to be done:
1) In the doCollapsing(...) methods of both concrete implementations of DocumentCollapser, the collapsed documents have to be stored. Depending on what you want you can store it in one big list or store it a list per most relevant document. The most relevant document is the document that does *not* collapse.
2) In the getCollapseInfo(...) method in the AbstractDocumentCollapser you then need to output these collapsed documents. If you are storing the collapsed documents in one big list then adding a new NamedList with collapsed document would be fine I guess. If you are storing the collapsed documents per document head, then I would add the collapsed document ids to existing resDoc named list. It is important that you return the Solr unique id instead of the lucene id.

This is just one approach, but what is the reason that you want this functionality? I guess what would be much easier, is to do a second query after the collapse query. In this second query you disable field collapsing (by not setting collapse.field) and you set fq=[collapse.field]=[collapse.value] for example.

Potentially the number of collapsed documents can be very large and in that situation it can have a impact on performance. Therefore I think that this functionality should be disabled by default. In the same way collapseInfoDoc and collapseInfoCount are managed.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: SOLR-236-FieldCollapsing.patch

This new patch resolves a performance issues.
I have added time informations for monitoring performances:

<str name="time">57/5</str>

The first value is the elapsed time (in milliseconds) needed to compute collapsed informations (CollapseFilter.ajacentCollapse method).
The second value is the elapsed time needed to compute results informations (CollapseFilter.getMoreResults method).

We are using Solr (with collapsing patch) on a large index in production environnment (120GB with more than 3 000 000 documents).

P.S.: This time, the patch is relative to the solr root directory.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Ron Veenstra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716412#action_12716412 ] 

Ron Veenstra commented on SOLR-236:
-----------------------------------

Thanks for the replies.

Thomas, I followed your steps, verifying same java version and build, etc. (all matched.  I'm working with a CentOS 5 machine..Any potential for the problem being related to that?)  
Patching and installing all appeared successful, but the resulting jetty powered page still resulted in:

org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent'
[followed by the long line of tracebacks..]

My solrconfig.xml included the following (included in case there is an obvious flaw):


<searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent" />

  <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <!-- 
       <int name="rows">10</int>
       <str name="fl">*</str>
       <str name="version">2.1</str>
        -->
     </lst>

  <arr name="components">
      <str>collapse</str>
    </arr>
  </requestHandler>


Stephen: I attempted your configuration as well, with the most recent patch and the patch you referenced, but the results were the same.

I am going to attempt a fresh try on an Ubuntu Machine, but any other ideas would be most appreciated.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have attached a new patch, that incorporates Micheal's quasi distributed patch so you don't have to patch twice. In addition to that the new patch also merges the collapse_count data from each individual shard response. When using this patch you will still need to make sure that all documents of one collapse group stay on one shard, otherwise your collapse result will be incorrect. The documents of a different collapse group can stay on a different shard.  

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment: field_collapsing_1.3.patch

Thank you Yonik ! 
Here is the complete version.

P.S.: It's time to go to bed in Europe ...

Emmanuel.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765196#action_12765196 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Aytek,

How I understand filter queries work is that each separate filter query produces a result set and each of this result set is intersected together. Which means that it works as you want it.
I'm not sure but I think that this issue is not related to the patch. I have tried to reproduce this situation (on a different data set), but it behaved as it should. With the patch and without.
Have you tried fq=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9] instead of having it in two separate fqs?

Martijn

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Kevin Cunningham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830305#action_12830305 ] 

Kevin Cunningham commented on SOLR-236:
---------------------------------------

Regarding Patrick's comment about a memory leak, we are seeing something similar - very large memory usage and eventually using all the available memory.  Were there any confirmed issues that may have been addressed with the later patches?  We're using the 12-24 patch.  Any toggles we can switch to still get the feature, yet minimize the memory footprint?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567224#action_12567224 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

OK, I think I have the first issue figured out. If the current resultset (lets say the first 10 rows) doesn't have the field that we are collapsing on, the counts don't show up. Is that correct?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Emmanuel Keller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Keller updated SOLR-236:
---------------------------------

    Attachment:     (was: SOLR-236-FieldCollapsing.patch)

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Charles Hornberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564952#action_12564952 ] 

Charles Hornberger commented on SOLR-236:
-----------------------------------------

NegatedDocSet is throwing "Unsupported Operation" exceptions:

org.apache.solr.common.SolrException:Unsupported Operation 
 at org.apache.solr.search.NegatedDocSet.iterator(NegatedDocSet.java:77)
 at org.apache.solr.search.DocSetBase.getBits(DocSet.java:183)  
 at org.apache.solr.search.NegatedDocSet.getBits(NegatedDocSet.java:27) 
 at org.apache.solr.search.DocSetBase.intersection(DocSet.java:199)      
 at org.apache.solr.search.BitDocSet.intersection(BitDocSet.java:30)     
 at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1109)
 at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:811)
 at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1258)
 at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:103)
 at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:155)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:902)     
 at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:275)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
 at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:595)

Not quite sure what search is triggering this path thru the code, but it is not happening on every request; just some ... am firing up the debugger now to see what I can learn, but thought I'd post this anyway to see if anyone has any tips.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793365#action_12793365 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

bq. We need to open a separate issue for the core related changes. 
As you properly have noticed I have split the patch into smaller patches and created sub issues for each patch.

bq. How about we change the current field collapsing response format to the following? 
Looks okay at first sight.

bq. For this to work, CollapseComponent must generate a custom SolrDocumentList and set it as "results" in the response.
Maybe we need a more elegant solution for this. All these extra fields are calculated values. If we were to put the calculated values into a certain context and the response writers can then look values up in the context and write them to the response. Other functionalities might also benefit from this solution like distances from a central point when doing a geo search. It is just an idea. I recall there an issue in Jira that propose something like this, but I couldn't find it.

bq. "collapse.aggregate" - Can we make this a multi-valued parameter instead of comma separated?
I think that is good idea, other parameters (like the fq) are also multi-valued.

BTW I think we should continue further technical discussions in the sub issues. We got space there for a lot of comments :-) 

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Oleg Gnatovskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603616#action_12603616 ] 

Oleg Gnatovskiy commented on SOLR-236:
--------------------------------------

I'd like to request some distributed search functionality for this feature as well.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "Aytek Ekici (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765076#action_12765076 ] 

Aytek Ekici commented on SOLR-236:
----------------------------------

Hi all,
Just applied "field-collapse-5.patch" and i guess there are problems with filter queries.

Here it is:

1- Use one(first) filter

http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]
numFound: 6284

2- Use second filter
http://10.231.14.252:8080/myindex/select?q=*:*&fq=lng:[24.5 TO 29.9]
numFound: 16912

3- Use both filters
http://10.231.14.252:8080/myindex/select?q=*:*&fq=lat:[37.2 TO 39.8]&fq=lng:[24.5 TO 29.9]
numFound: 19419

4- When using "q" instead of "fq" which is : http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] AND lng:[24.5 TO 29.9]
numFound: 3777 (which is the only correct number)

The thing is, as i understand, instead of applying "AND" for each filter query it applies "OR". Checked http://10.231.14.252:8080/myindex/select?q=lat:[37.2 TO 39.8] OR lng:[24.5 TO 29.9]
numFound: 19419 (same as 3rd one)

Any idea how to fix this?
Thx.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-236) Field collapsing

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-236:
---------------------------------------

    Fix Version/s: 1.4

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-236) Field collapsing

Posted by "German Attanasio Ruiz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781279#action_12781279 ] 

German Attanasio Ruiz commented on SOLR-236:
--------------------------------------------

Tomorrow I'm going to try the patch , the next time I hope to help and not only communicate the problem

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsin
 g.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.