You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn van Groningen (JIRA)" <ji...@apache.org> on 2011/06/01 01:10:47 UTC

[jira] [Created] (SOLR-2564) Integrating grouping module into Solr 4.0

Integrating grouping module into Solr 4.0
-----------------------------------------

                 Key: SOLR-2564
                 URL: https://issues.apache.org/jira/browse/SOLR-2564
             Project: Solr
          Issue Type: Improvement
            Reporter: Martijn van Groningen
            Assignee: Martijn van Groningen
             Fix For: 4.0


Since work on grouping module is going well. I think it is time to wire this up in Solr.
Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by Simon Willnauer <si...@googlemail.com>.
On Wed, Jun 8, 2011 at 2:27 PM, Robert Muir (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045924#comment-13045924 ]
>
> Robert Muir commented on SOLR-2564:
> -----------------------------------
>
> just send an email to the dev list... lots of people will +1, uwe will -1, but I dont see why this is any issue for 4.0

+1 :)
>
>> Integrating grouping module into Solr 4.0
>> -----------------------------------------
>>
>>                 Key: SOLR-2564
>>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>>             Project: Solr
>>          Issue Type: Improvement
>>            Reporter: Martijn van Groningen
>>            Assignee: Martijn van Groningen
>>             Fix For: 4.0
>>
>>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>>
>>
>> Since work on grouping module is going well. I think it is time to wire this up in Solr.
>> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048307#comment-13048307 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Patch looks great Martijn... only thing pending (I think?) is to perhaps disable caching collector when the query is MatchAllDocsQuery instance, since we know cache only hurts perf in this case.

Else, +1 to commit!  I don't think we should hold this pending the Java 1.6 discussion.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044867#comment-13044867 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

bq. > The other use-case is more like field collapsing and does change what documents match (basically, only the first documents in each group, up to limit, "match").

bq. I'm not sure it's that simple, ie that we can so cleanly model collapsing as reducing the docs to consider and then running faceting on that reduced set.

I *think* that's what was actually implemented in SOLR-236 IIRC, and what some people seem to be asking for.

bq. EG, the use case of getting correct facet counts for a field that has different values within the group, can't be handled by this approach?

Well, correct is a matter of context ;-) (for example, some have called the facet counts for the current grouping implementation "incorrect" because it didn't happen to match their use case).  Looking at the original description in LUCENE-3097, it seems you're talking about Martijn's 3rd method, while I was talking about the 2nd.  But maybe some people that were originally advocating for #2, really wanted #3?



> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046103#comment-13046103 ] 

Robert Muir commented on SOLR-2564:
-----------------------------------

{quote}
Another option is to allow the grouping module (separately from Lucene core) to use Java 6 code.... but even that could be "involved" 
{quote}

Personally I am against parts of the code being java 6 and other parts being java 5. we already have this situation today (the solr part is java 6, everyhting else is java 5).

Come on, its a major version, lets just cut over everything to java 6. Java 5 isn't even supported by oracle anymore, so why the hell do we support it?


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063342#comment-13063342 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

I've fixed the issue and added a test that triggered the exception.
Fixed in trunk in revision 1145173
Fixed in 3x branch in revision 1145176

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated SOLR-2564:
-------------------------------------

    Attachment: LUCENE-2564.patch


Patch looks great!

Attached new patch with minor fixes:

  * I had to fixup Solr's common-build.xml (it was referencing
    lucene/contrib/grouping from 3.x).  After that all tests pass.

  * Fixed jdocs for GroupParams.GROUP_CACHE (we now default to true
    not false, right?)

I think GroupParams.GROUP_FACETS (group.docSet request param) is
unused?  Should we remove it?  (This is for when we do LUCENE-3097, I
think?)

So this cutover improves Solr trunk grouping's impl by adding option
to get total group count (group.totalCount request param) and using
the CachingCollector for fast replaying of hits for 2nd pass collector
(adds group.cache/group.cache.maxSizeMB request params).  The
double-RAM issue is fixed, thanks to LUCENE-3099.

So I think this is ready to be committed!!


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046125#comment-13046125 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

bq. If the 16% slowdown is worst case

Actually, the worst case is twice as slow due to unneeded caching of a simple query.  Luckily this can be configured... but I still question the default, which can lead to surprisingly huge memory use (think up to a field cache entry or more allocated per-request).  One advantage to the dual-pass approach by default in the first place was avoiding surprisingly large memory usage by default (which can degrade less gracefully by causing OOM exceptions as people try to crank up the number of request threads).


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063244#comment-13063244 ] 

Simon Willnauer commented on SOLR-2564:
---------------------------------------

bq. Since Lucene is now also Java 6 we can just change the code in AbstractFirstPassGroupingCollector and the TermFirstPassGroupingCollectorJava6 in grouping.java is no longer needed, right?
yes thats right

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen resolved SOLR-2564.
-----------------------------------------

    Resolution: Fixed

Committed in revision 1137037

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044095#comment-13044095 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

FYI: there's a nocommit in there misspelled as "No commit"


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045027#comment-13045027 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

I've been checking out the performance, and it generally seems fine.  But of course we normally short circuit based on comparators and often don't get beyond that... so to exercise & isolate the rest of the code, I tried a worst-case scenario where the short circuit wouldn't work (sort=_docid_ desc) and solr trunk with this patch is ~16% slower than without it.  Any ideas what the problem might be?

{code}
http://localhost:8983/solr/select?q=*:*&sort=_docid_ desc&group=true&group.cacheMB=0&group.field=single1000_i
{code}

Note: the single1000_i field is a single valued int field with 1000 unique values

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045079#comment-13045079 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Hmmm.  Was this with or without caching?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044927#comment-13044927 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

I think a good criteria for "correct" is if you were to click through on the facet (ie, take the current query and add a filter on facet field = facet value), would the hit count you see match the facet count you were just looking at?

Ie, drill down should be "consistent".

Both approaches will give the same facets counts if the field never varies within the group (ie, the field "belongs" to the "parent" docs); it's only "child" fields where you need faceting to be aware of the groups, so for apps that never display facets on child fields, only computing facets on the group heads will work.

I suspect doc blocks will be the only practical way to implement faceting on child fields efficiently.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Initial patch. All tests, but the random test pass. All existing grouping test must pass without changes.

This patch also includes collectors for function queries.

LUCENE-3099 must be applied first before using the patch.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Attached a new patch that should select the term based collectors now more often:
* In the case of grouping by function and the value source is StrFieldValueSource.
* All field types that produce a StrFieldValueSource (in getValueSource) use now the term based collectors. So any custom field type can now also be supported. Both TextField and StrField produce a StrFieldValueSource. I'm not if this is the right approach, but it was the most easy way to implement. 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044940#comment-13044940 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

bq. But: why not just use Double.MAX_VALUE?
Yes, I should have used that and I'll change that. I thought that the size was initially used to create the underline array. But it isn't! The array inside the caching collector initially starts with a length 128 and grows when needed.

How I've currently implemented LUCENE-3097 is that it will only get the most relevant document of each group. In terms of SOLR-236 that is the same as using collapse.threshold=1. I think what Yonik means is increasing the threshold so more documents and up in the docset, that eventually is used by the facet component. Increasing this threshold also means setting when to start to collapse. So when setting the collapse.threshold=3 this means that from the 4th document the collapsing starts. I think that the whole collaps.threshold feature doesn't scale very well. 

Anyway, I think when we go wire the 2nd method (LUCENE-3097) into Solr, we should first make it work for the most relevant group documents.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046105#comment-13046105 ] 

Ryan McKinley commented on SOLR-2564:
-------------------------------------

bq. Java 5 isn't even supported by oracle anymore, so why the hell do we support it?

+1  (though this discussion should happen elsewhere)

If the 16% slowdown is worst case and under 'normal' use it would be equivolent/faster, i say lets move forward with that and push for java6 in a different issue.  Having a java6 version of lucene module (in solr?) seems like an mess.



> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044255#comment-13044255 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

{quote}
Do we have a group.facet = after ? I still like the old collapse component better because it had the collapse.facet=after.

I think there is another ticket, but not sure if it is being worked on?
{quote}
No, not in this issue.  This issue is just about cutting Solr over to the grouping module.

LUCENE-3097 is the issue for post-grouping facet counts.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044283#comment-13044283 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

bq. We can choose to leave out the upper-casing. Solr users would then need make sure that parameter options are spelled correctly. Would that be allright?

It seems like "typically" the request param values must be lower-case?
(Which I think is good...).

So, maybe... we make an exception to enum casing (normally ALL CAPS)
and use lowercase for enums that corresopnd to valid values for
request params?  Then we can just do .valueOf() directly, and we get
the strong checking to catch mis-spellings.


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044043#comment-13044043 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

bq. Fixed jdocs for GroupParams.GROUP_CACHE (we now default to true not false, right?)
Yes we do.

{quote}
I think GroupParams.GROUP_FACETS (group.docSet request param) is
unused? Should we remove it? (This is for when we do LUCENE-3097, I
think?)
{quote}
I 'reserved' this parameter for LUCENE-3097. Should we keep it? Once LUCENE-3097 is committed, I will quickly integrate it into Solr.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-2564:
-------------------------------

    Attachment: SOLR-2564_performance_loss_fix.patch

Here's a simple patch that fixes the performance we lost when Solr cut over to the modules implementation (due to the fact that modules are on Java5 and Solr is on Java6).

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045904#comment-13045904 ] 

Martijn van Groningen edited comment on SOLR-2564 at 6/8/11 11:12 AM:
----------------------------------------------------------------------

{quote}
I've been checking out the performance, and it generally seems fine. But of course we normally short circuit based on comparators and often don't get beyond that... so to exercise & isolate the rest of the code, I tried a worst-case scenario where the short circuit wouldn't work (sort=docid desc) and solr trunk with this patch is ~16% slower than without it. Any ideas what the problem might be?
{quote}

What might be the problem is that the trunk is using (Grouping.java 589):
{code}
SearchGroup smallest = orderedGroups.pollLast();
{code}

Whilst the AbstractFirstPassGroupingCollector (line 217) is using:
{code}
final CollectedSearchGroup<GROUP_VALUE_TYPE> bottomGroup = orderedGroups.last();
orderedGroups.remove(bottomGroup);
{code}
The above also happen around line 271.

I haven't checked this out, but I think it is the most likely explanation between those two implementations. Retrieving the bottom group will be done in almost all cases when the short circuit doesn't work 

      was (Author: martijn.v.groningen):
    {quote}
I've been checking out the performance, and it generally seems fine. But of course we normally short circuit based on comparators and often don't get beyond that... so to exercise & isolate the rest of the code, I tried a worst-case scenario where the short circuit wouldn't work (sort=docid desc) and solr trunk with this patch is ~16% slower than without it. Any ideas what the problem might be?
{quote}

What might be the problem is that the trunk is using (Grouping.java 589):
{code}
SearchGroup smallest = orderedGroups.pollLast();
{code}

Whilst the AbstractFirstPassGroupingCollector (line 217) is using:
{code}
final CollectedSearchGroup<GROUP_VALUE_TYPE> bottomGroup = orderedGroups.last();
orderedGroups.remove(bottomGroup);
{code}
The above also happen around line 271.

I haven't checked this out, but I think it is the most likely explanation between those two implementations. Retrieving the bottom will be done in almost all cases when the short circuit doesn't work 
  
> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050677#comment-13050677 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

I also did some performance tests with the following query on random data in the example schema:
{code}http://localhost:8983/solr/select?q=*:*&sort=_docid_ desc&group=true&group.cacheMB=0&group.field=single1000_i{code}
The field single1000_i had 1000 distinct values and the index has in total 100000 documents.

I ran this query on the following Solr setups:
* Last nights nightly build.
* Solr build with this patch as it is.
* Solr build with this patch and the necessary changes in AbstractFirstPassGroupingCollector so that pollLast was used in all cases.
During my tests I noticed that differences between the first and the second setups was neglectable smal, but the the last Solr setup was on average 32% faster than the two other setups. So moving to the Java6's pollLast() method has definitely a positive impact on performance!

I also think that this patch is ready to be committed and that the pollLast should be added when Lucene or the grouping module is java 6. (I prefer the first option) I'll commit it in the coming day or so.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044820#comment-13044820 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Patch looks great Martijn!

Only thing I noticed is cacheSizeMB is computed incorrectly from
maxDoc (for the -1 case), because that's all int math I think?  Ie
it'll be truncated from eg 13.7 MB -> 13.  But: why not just use
Double.MAX_VALUE?


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046072#comment-13046072 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

I think we should decouple the "should Lucene require Java 6" (which I expect to be an..... involved, discussion) from making progress here?

My feeling is we should still commit this.  The 16% slowdown is on a very synthetic case (MatchAllDocsQuery, sorting by reversed docID, grouping by random int field)... unless we also see unacceptable slowdowns in more realistic cases?  Also, net/net the user should see a speedup, typically, since caching is enabled by default.

We should still open an issue to cutover this code back to the pollLast once we can use Java 6 code.

Another option is to allow the grouping module (separately from Lucene core) to use Java 6 code.... but even that could be "involved" :)

Yonik, how do you create the index used for this test?  Somehow you generate an int field w/ random 1000 unique values -- do you have a client-side script you use to create random docs in Solr?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045087#comment-13045087 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

This was without caching to put them on an even footing (and given that the base query was all docs, caching would be slower anyway).  The URL above was the actual one used to test.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045904#comment-13045904 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

{quote}
I've been checking out the performance, and it generally seems fine. But of course we normally short circuit based on comparators and often don't get beyond that... so to exercise & isolate the rest of the code, I tried a worst-case scenario where the short circuit wouldn't work (sort=docid desc) and solr trunk with this patch is ~16% slower than without it. Any ideas what the problem might be?
{quote}

What might be the problem is that the trunk is using (Grouping.java 589):
{code}
SearchGroup smallest = orderedGroups.pollLast();
{code}

Whilst the AbstractFirstPassGroupingCollector (line 217) is using:
{code}
final CollectedSearchGroup<GROUP_VALUE_TYPE> bottomGroup = orderedGroups.last();
orderedGroups.remove(bottomGroup);
{code}
The above also happen around line 271.

I haven't checked this out, but I think it is the most likely explanation between those two implementations. Retrieving the bottom will be done in almost all cases when the short circuit doesn't work 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044573#comment-13044573 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

{quote}
groupCount is now GROUPED or UNGROUPED, and is used to set "Accuracy" (more on that later 
Seems like this parameter should be a boolean that says if the total number of groups should be returned?
If true, we can add a "ngroups" or "groupCount" element at the same level as "matches". We should probably just name the parameter the same thing as the variable that gets returned... i.e. group.ngroups=true would cause "ngroups" to be populated (or groupCount if we decide that's a better name).
{quote}
I like that. I'll update the patch soon, so it includes ngroups together with matches, if group.ngroups=true is specified. 

bq. "method" should probably be reserved for the algorithm used for collapsing (as we do for faceting).
Maybe we can use this for letting the user choose one of the Term|Block|Function|*GroupCollectors. The collectors define the way how groups are created.

bq. However for now if it's easier, we could treat group.collapse=true to apply to the base query and all filters, and handle the use case I mentioned above via a qparser in the future
I agree let's commit this first and then start to enhance the collapse feature. The group.collapse should be included when LUCENE-3097 is finished which shouldn't take that long! 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046091#comment-13046091 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

bq. Another option is to allow the grouping module (separately from Lucene core) to use Java 6 code

+1

bq. Yonik, how do you create the index used for this test? Somehow you generate an int field w/ random 1000 unique values – do you have a client-side script you use to create random docs in Solr?

I have some CSV files laying around that I reuse for ad-hoc testing of a lot of stuff.  They were created with a simple python script.
Then I simply index with
{code}
URL=http://localhost:8983/solr
curl "$URL/update/csv?stream.url=file:/tmp/test.csv&overwrite=false&commit=true"
{code}

It was also my first reaction to think that this is a very synthetic case that people are unlikely to hit... until I thought about dates.  Indexing everything in date order is a pretty common thing to do, and so is sorting by date - which hits the exact same case.  Queries of *:* and simple filter queries on type, etc, also tend to be pretty common (i.e. full-text relevance/performance actually isn't an important feature for some users).

How complex must queries be for caching to generate a net benefit under load? I haven't tried to test this myself.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Attached an updated patch.
* Renamed group.groupCount to group.ngroups
* When group.ngroups=true ngroups is added with matches in the response.
* Included group.ngroups option into random testing. (all grouping tests succeed) 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044821#comment-13044821 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

bq. The other use-case is more like field collapsing and does change what documents match (basically, only the first documents in each group, up to limit, "match").

I'm not sure it's that simple, ie that we can so cleanly model
collapsing as reducing the docs to consider and then running faceting
on that reduced set.

EG, the use case of getting correct facet counts for a field that has
different values within the group, can't be handled by this approach?
This is the count=2 for size=S in my example at
https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038605&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038605

I think to do that properly, the faceting impl needs to see all docs
in the group, not just the "lead doc" per group.

I think another way to visualize/model this that we really need to be
able to configure "which field counts" (ID_FIELD) for the schema.
This field would then decide all counts -- total "hit count", facet
counts, etc., ie each of these counts is count(unique(ID_FIELD)) of
the docs falling in that facet/result set.  The default is Lucene's docid,
but the app should be able to state any other ID_FIELD.


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045924#comment-13045924 ] 

Robert Muir commented on SOLR-2564:
-----------------------------------

just send an email to the dev list... lots of people will +1, uwe will -1, but I dont see why this is any issue for 4.0

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044080#comment-13044080 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

I haven't been following this... I took a quick look at the patch, and at first blush it's hard to tell what changes are cleanup and what changes are cut-over.

 - in the QueryComponent, why the change to set the GET_SCORES flag based on the sort(s)?
 - I'm not a fan of this new style for matching request parameters to enums... solr does a *lot* of lookups on a typical request, and a switch to this style everywhere could definitely have an impact (the whole upper-casing the request param so we can match it to the enum name). 
 - "Accuracy" seems a bit mis-named?  It seems to imply an accuracy trade-off, but both methods are 100% accurate here, they just do different things to serve different usecases.  At least the name doesn't seem to have made it's way into the external API though.
 - The parameter "group.totalCount" I would expect to return the total count of something, not control the pre/post faceting thing?  (or are the comments just wrong?)  If it's to return the number of groups, then perhaps the name should be "group.groupCount" as totalCount is unit-less.
 - What does "group.docSet" do?  The comments don't quite make sense to me, but the param suggests it's sort of like group.totalCount?
 - I'm not sure we should default group.cache to true... there's a downside to the memory usage, and it's fragile: things may be working just fine, and the user may add a few more documents to the index and then the limit is hit and it just stops working (but still consumes much memory and extra log warnings per request).
 - in the interest of reducing the number of parameters, we could dump group.cache and have a single group.cacheMB parameter that uses 0 as no cache, -1 as maximum needed (solr uses -1 in this manner in other places too), and other values as literal number of MB (which I'd discourage people from using personally).


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2564:
---------------------------

    Priority: Blocker  (was: Major)

Marking this issue as a blocker for Solr 4.0 per McCandless comment in SOLR-2524...

{quote}
That said, the plan is definitely to get Solr 4.0 cutover to the
grouping module; it's just a matter of time. I don't think we should
ship 4.0 until we've done so.
{quote}

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041892#comment-13041892 ] 

Martijn van Groningen edited comment on SOLR-2564 at 5/31/11 11:15 PM:
-----------------------------------------------------------------------

Initial patch. All tests, but the random test pass. The goal is that all existing grouping test must pass without any changes to the TestGroupingSearch class.

This patch also includes collectors for function queries.

LUCENE-3099 must be applied first before using the patch.

      was (Author: martijn.v.groningen):
    Initial patch. All tests, but the random test pass. All existing grouping test must pass without changes.

This patch also includes collectors for function queries.

LUCENE-3099 must be applied first before using the patch.
  
> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046766#comment-13046766 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Ahh, I see.  Could we turn off caching if the query is instanceof AllDocsQuery?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045002#comment-13045002 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

Browsing around this a bit more... the existing solr code selected the string based collectors for any ValueSource of StrFieldSource.  This patch resorts to exact getClass() checks against string and text fields which won't match in as many cases (either derived fields, or user custom fields that don't derive from either of the these field types)

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046911#comment-13046911 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

+1

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047918#comment-13047918 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

bq. it looks like we should hold SOLR-2524 until we commit this, then backport & commit to 3.x.
I agree with that. Also most of the discussion takes place in this issue. 

I think we're getting close to commit this issue.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047478#comment-13047478 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Hmm, I think this only needs to be a 4.0 blocker if we commit SOLR-2524 (3.x Solr grouping) first.

But at this point, since we are close on this issue, it looks like we should hold SOLR-2524 until we commit this, then backport & commit to 3.x.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Matteo Melli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052492#comment-13052492 ] 

Matteo Melli commented on SOLR-2564:
------------------------------------

Hi there,

I'm testing this functionality into my project and found what I think it's a bug. The revision I'm working on is 1137889.

I could reproduce the bug with a really simple index (the column is of type solr.String):

|| Col1 ||
| 1 |
| 2 |
| 3 |

The bug appear when I try to do a query with grouping mixing parameters start (with a value greather than 0) and group.main=true:

http://localhost:8983/solr/test/select/?q=*:*&start=1&group=true&group.field=Col1&group.main=true

The error trace is:

Jun 21, 2011 1:32:10 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 3
	at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:119)
	at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:247)
	at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:153)
	at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111)
	at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:37)
	at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:340)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:242)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

The problem does not appear without group.main=true so this may be a related bug to that option.

PS: I was not sure if there where to open a bug since the version affected is still in development. Anyway sorry for any inconvenient.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044253#comment-13044253 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

bq. I haven't been following this... 

Thank you for the review Yonik!

bq. I took a quick look at the patch, and at first blush it's hard to tell what changes are cleanup and what changes are cut-over.

It's definitely a big refactoring.

bq. in the QueryComponent, why the change to set the GET_SCORES flag based on the sort(s)?

I assume this is because if you sort or groupSort by a Sort that
contains SortField.SCORE, you need a scorer.  Not sure why Solr trunk
gets away with not doing this, though... Martijn do you know?  Oh, and
likely because the CachingCollector needs to know up front if scores
are needed.

bq. I'm not a fan of this new style for matching request parameters to enums... solr does a lot of lookups on a typical request, and a switch to this style everywhere could definitely have an impact (the whole upper-casing the request param so we can match it to the enum name).

The .upper() calls (twice per request, only if the request has
group=true) are negligible here (true, other situations might be
different, though we should fix them to lookup/check once).

And, this approach gives us strong checking of the enum fields.
Contrast that with what's on trunk now:

{noformat}
  String format = params.get(GroupParams.GROUP_FORMAT);
  Grouping.Format defaultFormat = "simple".equals(format) ? Grouping.Format.Simple : Grouping.Format.Grouped; 
{noformat}

If you have a typo in "simple", you'll unexpectedly get
Grouping.Format.Grouped.

I think we should strongly check all of our request params?

{quote}
"Accuracy" seems a bit mis-named? It seems to imply an accuracy trade-off, but both methods are 100% accurate here, they just do different things to serve different usecases. At least the name doesn't seem to have made it's way into the external API though.

The parameter "group.totalCount" I would expect to return the total count of something, not control the pre/post faceting thing? (or are the comments just wrong?) If it's to return the number of groups, then perhaps the name should be "group.groupCount" as totalCount is unit-less.
{quote}

I agree.  This setting just controls what the "total hit count" should
count -- unique groups vs docs.

How about TotalCount.GROUPS and TotalCount.DOCS?

bq. What does "group.docSet" do? The comments don't quite make sense to me, but the param suggests it's sort of like group.totalCount?

I think we should remove this from this patch (see my comment above --
it applies to LUCENE-3097).

bq. in the interest of reducing the number of parameters, we could dump group.cache and have a single group.cacheMB parameter that uses 0 as no cache, -1 as maximum needed (solr uses -1 in this manner in other places too), and other values as literal number of MB (which I'd discourage people from using personally).

+1, that makes sense.

bq. I'm not sure we should default group.cache to true... there's a downside to the memory usage, and it's fragile: things may be working just fine, and the user may add a few more documents to the index and then the limit is hit and it just stops working (but still consumes much memory and extra log warnings per request).

Enabling caching can make a huge improvement in QPS, especially for
queries that are costly to execute but don't match too many docs.

Maybe instead of a fixed MB we could make it a percentage of the
maxDoc?  This would make it less fragile..  So you could say you're
willing to cache up to 50% of the total docs in the index.

bq. FYI: there's a nocommit in there misspelled as "No commit"

Martijn can you fix this one...?  And in general try to spell it as
"nocommit" :) This is what our build catches.  (And, fear not, you're
not the only person to have trouble spelling nocommit!!).


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044122#comment-13044122 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

In QueryComponent, is it possible to remove this?
{code}
+        if (limitDefault < 0) {
+          limitDefault = searcher.maxDoc();
+        }
{code}

It wasn't necessary before, and there are advantages to preserving information (like the fact that someone said "no limit" vs a specific number) until as late as possible.  That was previously handled by getMax() in Grouping.java, and I still see it being called... so it should be OK?


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044322#comment-13044322 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

bq. So, maybe... we make an exception to enum casing (normally ALL CAPS) and use lowercase for enums that corresopnd to valid values for request params? 

+1 - seems like a good tradeoff, and the fact that it's scoped by the classname still makes it stand out.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045920#comment-13045920 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

Ah, good call Martijn - it must be that pollLast was replaced with two map operations.
Too bad Lucene isn't on Java6 yet!

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046707#comment-13046707 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

bq. > Actually, the worst case is twice as slow due to unneeded caching of a simple query.
bq. Sorry, what do you mean here?

The worst case with this patch as a whole (due to the caching by default).
This type of query is twice as slow:
{code}
http://localhost:8983/solr/select?q=*:*&group=true&group.field=single1000_i
{code}

Which led to me wondering about how complex queries must be before the caching is a win.



> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044267#comment-13044267 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

bq. How about TotalCount.GROUPS and TotalCount.DOCS?
+1 I think we can with that name for the total counts. 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Attached an update version of the patch.
The group.cache.percent defaults now to 0, which disables the grouping cache.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Added updated patch. All tests are passing now. I think that when LUCENE-3099 is committed, this patch can also be committed.

Also included a small fix to the CachingCollector. During replaying the scorer is only set once. This must be done for every segment. This fix resolves that issue

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063272#comment-13063272 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

Hi Matteo, I can also confirm the bug and only happens when group.main=true. I also think that this error occurs on 3x code base. I'll provide a fix for this issue soon.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Hi Yonik,

It is good to know that you took a look at the patch!

bq. in the QueryComponent, why the change to set the GET_SCORES flag based on the sort(s)?
Yes I did this because I used to set Grouping.needScores with this flag. The needScores I also used whether to indicate if the scores need to be cached. However I have changed this in the updated patch and basically this check isn't done with setting GET_SCORES flag. 

bq. I'm not a fan of this new style for matching request parameters to enums...
We can choose to leave out the upper-casing. Solr users would then need make sure that parameter options are spelled correctly. Would that be allright? 

bq. "Accuracy" seems a bit mis-named?
Maybe another name is more descriptive. Maybe style or method?

bq. The parameter "group.totalCount" I would expect to return the total count of something, not control the pre/post faceting thing?
The jdoc is mixed up with group.docSet. I also think that group.groupCount is a better name. I changed this in the new patch 

bq. What does "group.docSet" do?
Currently nothing. I plan to use it when I finish LUCENE-3097. Basically it will decide whether the docset (for FacetComponent and StatsComponent) is based on plain documents or groups. Since you can have more than one Command (Field / Function / Query), it will then select the first CommandField or CommandFunction. I'm not sure how we should handle multiple command when having more than one command. 

bq. I'm not sure we should default group.cache to true
The query time can really be reduced with this option, but yes it requires more memory. If the cache collector threshold is met they array is immediately set to null during the search, so gc might be able to clean it up during the search. Also Solr users get a message in the response. Somehow I forget to move that from SOLR-2524, but it is in the updated patch now.

bq. we could dump group.cache and have a single group.cacheMB parameter that uses 0 as no cache, -1 as maximum needed (solr uses -1 in this manner in other places too)
Makes sense, grouping then at least consistent with the rest of Solr. I made it default to -1 for now.

bq. FYI: there's a nocommit in there misspelled as "No commit"
I have removed that.

{quote}
It wasn't necessary before, and there are advantages to preserving information (like the fact that someone said "no limit" vs a specific number) until as late as possible. That was previously handled by getMax() in Grouping.java, and I still see it being called... so it should be OK?
{quote}
I've removed this if statement and made sure that getMax(...) is used wherever it is needed.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046150#comment-13046150 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

bq. Actually, the worst case is twice as slow due to unneeded caching of a simple query.

Sorry, what do you mean here?

bq. but I still question the default, which can lead to surprisingly huge memory use (think up to a field cache entry or more allocated per-request).

I agree; -1 is a dangerous default.

But I think caching should still default to on, just limited as a pctg
of the number of docs in the index.  Ie, by default we will cache the
result set if it's less than 20% (say) of total docs in your index.
Else we fallback to 2-pass.

I think this matches how Solr handles caching filters now?  Ie, filter
cache evicts by total filter count and not net MB right, I think?  So
that if you have more docs in your index you'll spending more RAM on
the caching...

Costly queries that return a smallish result set can see big gains
from the caching.


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046002#comment-13046002 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

Ahh nice catch Martijn!

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052857#comment-13052857 ] 

Michael McCandless commented on SOLR-2564:
------------------------------------------

That fix looks great Yonik!  Let the subclass use Java 6 only code...

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063231#comment-13063231 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

Since Lucene is now also Java 6 we can just change the code in AbstractFirstPassGroupingCollector and the TermFirstPassGroupingCollectorJava6 in grouping.java is no longer needed, right?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Bill Bell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044187#comment-13044187 ] 

Bill Bell commented on SOLR-2564:
---------------------------------

Do we have a group.facet = after ? I still like the old collapse component better because it had the collapse.facet=after.

I think there is another ticket, but not sure if it is being worked on?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052856#comment-13052856 ] 

Robert Muir commented on SOLR-2564:
-----------------------------------

+1 thats a really good solution, fixes the issue and we can still table the java5/java6 stuff for later.


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564_performance_loss_fix.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2564:
----------------------------------------

    Attachment: SOLR-2564.patch

Added updated patch.
* The caching behaviour is now controlled by group.cache.percent=[0-100]. This percentage is relative to maxdoc. A value of 0 will disable group caching and the default is 20.
* Added hits to log as requested in SOLR-2337. This will add hits also when group.main is not true. 

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046106#comment-13046106 ] 

Erik Hatcher commented on SOLR-2564:
------------------------------------

Just for grins, here's a Ruby script that'll do it (provided you have the solr-ruby gem installed):

{code}
require 'solr'
solr = Solr::Connection.new
1.upto(1000) {|i| solr.add(:id=>i, :single1000_i=>i)}
solr.commit
{code}


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044535#comment-13044535 ] 

Yonik Seeley commented on SOLR-2564:
------------------------------------

Thanks for theupdate Martijn, it's looking good.

Just a note that the following optimization will no longer be valid once we have "post collapse faceting" or whatever we're calling it, or
when we have an option to return the total number of groups.
But hopefully our random testing will catch that in the future.

{code}
    protected Collector createFirstPassCollector() throws IOException {
      // Ok we don't want groups, but do want a total count
      if (actualGroupsToFind <= 0) {
        fallBackCollector = new TotalHitCountCollector();
        return fallBackCollector;
      }
{code}

bq. However I have changed this in the updated patch and basically this check isn't done with setting GET_SCORES flag.

Thanks... GET_SCORES does have a different meaning: scores must be returned to the caller (which can still be false even if scores are used for sorting)

bq. I also think that group.groupCount is a better name.

groupCount is now GROUPED or UNGROUPED, and is used to set "Accuracy" (more on that later ;-)
Seems like this parameter should be a boolean that says if the total number of groups should be returned?
If true, we can add a "ngroups" or "groupCount" element at the same level as "matches".  We should probably just name the parameter the same thing as the variable that gets returned... i.e. group.ngroups=true would cause "ngroups" to be populated (or groupCount if we decide that's a better name).

bq. Maybe another name is more descriptive. Maybe style or method?
"method" should probably be reserved for the algorithm used for collapsing (as we do for faceting).

Background for others: This feature has been called many things like "post collapse faceting", etc.  But it's really much more than that.  Normal grouping simply groups documents and presents them in a different way, but does *not* change what documents match the base query + filters.  The other use-case is more like field collapsing and does change what documents match (basically, only the first documents in each group, up to limit, "match").

Maybe just use a word from the original name for this whole feature... "group.collapse=true"?

There are some other interesting semantics to work out for group.collapse=true, such as if the collapsing happens before or after filters are applied.  Perhaps either could make sense depending on the use case?  Here's one use case I can think of: using field collapsing for only showing the latest version of a document.  In this case, one would only want collapsing to apply to the base query (with filtering happening after that) because you don't want to get into the position of having a filter that filters out the most recent version of a document and thus shows an older version.

However for now if it's easier, we could treat group.collapse=true to apply to the base query and all filters, and handle the use case I mentioned above via a qparser in the future.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046845#comment-13046845 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

{quote}
But I think caching should still default to on, just limited as a pctg
of the number of docs in the index. Ie, by default we will cache the
result set if it's less than 20% (say) of total docs in your index.
{quote}
Maybe instead of specifying a maximum size for the second pass cache, we could specify it with a percentage (0 till 100) relative from maxdoc. In this case when the index grows in number of documents the cache is still used for a lot of queries (depending on the specified percentage). So if we go with this maybe group.cacheMB should be renamed to group.cache.percentage. The default can then be something like 20. Any thoughts about this?

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048360#comment-13048360 ] 

Martijn van Groningen edited comment on SOLR-2564 at 6/12/11 3:47 PM:
----------------------------------------------------------------------

Attached an update version of the patch.
* The group.cache.percent defaults now to 0, which disables the grouping cache.

      was (Author: martijn.v.groningen):
    Attached an update version of the patch.
The group.cache.percent defaults now to 0, which disables the grouping cache.
  
> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048358#comment-13048358 ] 

Martijn van Groningen commented on SOLR-2564:
---------------------------------------------

After some more testing I think it is better to disable the group cache by default.
Here are my small test results on a index of 30M documents. Test were produced on a low end machine.
||Query type||Query||Nr of matches||Search time with caching||Search time without caching||
|Term query|country:es|24270711|2564|2261| 
|Boolean query|country:es OR country:tr|25951654|4014 |4170|
|Fuzzy query|region:riviera~0.001|977043|146|189|
|Wildcard query|region:r*|1608781|177|190|
|Complexer boolean query|region:r* OR region:riveria~0.0001 OR country:es OR country:tr|27582783|5328|7289|
So I think that we can conclude from here is that the group cache work well for boolean, wildcard and fuzzy queries.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org