You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/06/10 21:48:58 UTC

[jira] [Created] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Add TopDocs.merge to merge multiple TopDocs
-------------------------------------------

                 Key: LUCENE-3191
                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Michael McCandless
             Fix For: 3.3, 4.0


It's not easy today to merge TopDocs, eg produced by multiple shards,
supporting arbitrary Sort.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056339#comment-13056339 ] 

Uwe Schindler edited comment on LUCENE-3191 at 6/28/11 7:34 AM:
----------------------------------------------------------------

bq. I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you -cross segments during compare- merge TopDocs. -Maybe the index in the tests is so small that it never has multiple segments.-

The clover report shows that this is not the case, it even shows that compare does not use compareBottom():
[https://builds.apache.org/job/Lucene-trunk/clover/org/apache/lucene/search/SlowCollatedStringComparator.html]

-It never ever calls even compareBottom() so this index is really small. We should make it bigger to test correctly. The getValue() method is only called because the actual String (previously: BytesRef) values are placed in TopFieldDocs.-

      was (Author: thetaphi):
    bq. I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.

The clover report shows that this is the case:
[https://builds.apache.org/job/Lucene-trunk/clover/org/apache/lucene/search/SlowCollatedStringComparator.html]

It never ever calls even compareBottom() so this index is really small. We should make it bigger to test correctly. The getValue() method is only called because the actual String (previously: BytesRef) values are placed in TopFieldDocs.
  
> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191.patch

Small further patch for trunk:

  * Simplifies the API by moving shardIndex onto ScoreDoc

  * Fixes TopDocs.merge to return TopFieldDocs if the Sort != null

  * A couple FieldComparators must override compareValue because the
    values may be null.


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047440#comment-13047440 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

Uwe, you are right!  Now why didn't I think of that...

The returned Comparable should be expected to properly compare itself to any other Comparable returned from FieldComparator.value... so I'll do that and then the patch is nice and small.  And no API change for 3.x.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3191.
-----------------------------------

    Resolution: Fixed

Committed fix trunk rev 1140713

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3191:
----------------------------------

    Attachment: LUCENE-3191-SlowCollatorCompareFix.patch

New patch that also checks the getValue/compareValues methods in the comparator.

Sorry, my earlier comment about too small indexes was wrong. The indexes in fact have segments. The problem was that I was irritated by the former MultiSearcher code that used these compareValues() code. In trunk, only TopDocs.merge() does this.

So I changed the test:
- It does not use MatchAllDocsQuery, instead two TermRangeQueries with a random center point, so it splits the index into two parts. Each result is sorted and checked that its valid.
- With the two result sets it calls TopDocs.merge() and again checks the result. This call will invoke the value methods and here the test fails as Robert said.
- For each search call it does *not* always set the maximum number of docs as PQ size (else compareBottom/setBottom) are not tested. It reduces the PQ size to a fraction of numDocs for all searches.

In general we should add similar tests to all custom comparators in Lucene's code, because the comparator can only be tested completely if you also check merging TopDocs.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047477#comment-13047477 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

This still confuses me:

bq. There might only be a bug in RelevanceComparator: Its getValue() method returns a comparable that sorts in wrong order. We have no test for this, so it might never cause a test failure.

In my opinion, it should return a negative Float object. But as far as I know, there was/is already some special case in the collectors merge code used to merge segment's results (FieldValueHitQueue.fillFields copys the values into the collected docs, but I am not sure if this is still used.

The good old deprecated FieldDocSortedHitQueue in 3.x (what's the replacement?) contains this special case:

{code}
} else {
  c = docA.fields[i].compareTo(docB.fields[i]);
  if (type == SortField.SCORE) {
    c = -c;
  }
}
{code}

In trunk it's gone, so we can maybe fix this stupidness. The Comparable returned by RelevanceComparator (used with SortField.SCORE) should simply be negative? Else we have to add this special case in your TopDocs.merge, too.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049908#comment-13049908 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

For 3.x, I think we should make an exception to back-compat and break the API (changing FieldComp.value(..) to return <T> not Comparable; changing FieldDoc.fields from Comparable[] to Object[]).  I'll advertise the break in CHANGES.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056625#comment-13056625 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

Mike: This is a good idea, not sure if this could break scores, ranking and other things :-)

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047781#comment-13047781 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

Looks fine, I am happy now :-)

The RelevanceComparator should use simply to minimize unboxing:

{code}
+    public int compareValues(Float first, Float second) {
+      return second.compareTo(first); // reverse!
+    }
{code}

Will review more closely tomorrow!

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056629#comment-13056629 ] 

Robert Muir commented on LUCENE-3191:
-------------------------------------

also remember, when you call newSearcher, it optionally takes a default boolean parameter (maybeWrap, default=true).

So tests sensitive to stuff like this can just pass false here... in general I think beefing up newSearcher with stuff like this is a good idea, so far adding logic here (randomly use executor, randomly use slowwrapper, etc) has always found some bugs... eventually


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047416#comment-13047416 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

bq. The basic idea is simple (use PQ to find top N across all shards), but, I had to add FieldComparator.compare(Comparable, Comparable).

That makes no sense to me, because Comparable<?> can always compare against each other without a separate comparator. The old MultiSearcher did exactly do this. This is why it returns Comparable<?>. So instead FieldComparator.compare(a, b) just use a.compareTo(b). It's in the responsibility of the Comparator to return a correctly wrapped Comparable.

There might only be a bug in RelevanceComparator: Its getValue() method returns a comparable that sorts in wrong order. We have no test for this, so it might never cause a test failure.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056339#comment-13056339 ] 

Uwe Schindler edited comment on LUCENE-3191 at 6/28/11 6:37 AM:
----------------------------------------------------------------

bq. I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.

      was (Author: thetaphi):
    I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.
  
> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056339#comment-13056339 ] 

Uwe Schindler edited comment on LUCENE-3191 at 6/28/11 6:44 AM:
----------------------------------------------------------------

bq. I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.

The clover report shows that this is the case:
[https://builds.apache.org/job/Lucene-trunk/clover/org/apache/lucene/search/SlowCollatedStringComparator.html]

It never ever calls even compareBottom() so this index is really small. We should make it bigger to test correctly. The getValue() method is only called because the actual String (previously: BytesRef) values are placed in TopFieldDocs.

      was (Author: thetaphi):
    bq. I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.
  
> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-3191:
------------------------------------------

    Assignee: Michael McCandless

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049932#comment-13049932 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

Patch looks good, let the BackwardsPoliceman think about some possibilities to lower the risk of breaking code. Of course nothing sophisticated...

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reopened LUCENE-3191:
---------------------------------


Reopening: this code in SlowCollatedStringComparator is totally broken:
{noformat}
  @Override
  public int compareValues(BytesRef first, BytesRef second) {
    if (first == null) {
      if (second == null) {
        return 0;
      }
      return -1;
    } else if (second == null) {
      return 1;
    } else {
      return collator.compare(first, second);
    }
  }
{noformat}

I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!), it *will* throw ClassCastException:
http://download.oracle.com/javase/1,5.0/docs/api/java/text/Collator.html#compare(java.lang.Object, java.lang.Object)


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047479#comment-13047479 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

By the way, in current trunk the value() method in FieldComparator is obsolete and slows down search, if the field values are not needed. But of course, this patch makes use of it again, but we should correct it.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191.patch

Patch.

The basic idea is simple (use PQ to find top N across all shards),
but, I had to add FieldComparator.compare(Comparable, Comparable).
Ie, the FieldComparator should be able to compare the Comparables
returned by its value method.


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047748#comment-13047748 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

We had some discussions about cleaning this up in IRC: [http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-06-10#l235]

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056452#comment-13056452 ] 

Robert Muir commented on LUCENE-3191:
-------------------------------------

ahh thank you!!! I was very confused at why this method was untested!

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191.patch

New patch, addresses feedback from Uwe (thanks!), and also adds
merging for TopGroups and SearchGroup to make distributed grouping
easy(ier).

I think it's ready to commit!


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-3191.
----------------------------------------

    Resolution: Fixed

Thanks Uwe!

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049928#comment-13049928 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

I think this has less impact on users. Two user types:

- People using FieldDoc.fields[] would always cast the return type, so a simple recompile should be fine
- People writing own FieldComparators must change return value of getValue() and maybe add generics (not required)
- People that dont implement compareValue() will be also fine, as the default impl casts to Comparable and that will have the same behaviour

The 3.x impl just have to fix FieldDocSortedHitQueue to use compareValue() and remove the negation for scores.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3191:
----------------------------------

    Attachment: LUCENE-3191-SlowCollatorCompareFix.patch

I looked at the code, it was ununderstandable why this thing was generified to BytesRef at all. As it compares String, getValue() and compareValue() should take Strings.

And thats done in the patch, its even a little bit no so slow :-)

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047472#comment-13047472 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

So... I started down this path (relying on the returned Comparable
from .value to .compareTo themselves, instead of adding new .compare
method to FieldComp), but I'm not sure I like it...

I had to add a ReverseFloatComparable inside RelevanceComp, since it
sorts opposite natural Float sort order by default.

But then what this means, for an app that wants to do some sharding,
suddenly a TopDocs might contain an instance of this class, whereas
now it contains plain Java objects (Float, Integer, etc.).

I also don't like that this is splitting up the logic of how relevacne
scores compare to one another across two places (RelevanceComp and
this new ReverseFloatComparable).

I think it'd be better if we keep simple objects in the TopDocs, to
keep it easy for apps to serialize themselves (since we don't impl
Serializable anymore), and then the front end would invoke
RelevanceComparator locally to properly compare the floats.

Ie, really FieldComp.value should never have returned Comparable, I
think?


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191.patch

New patch, adds default impl for FC.compareValues to just cast to Comparable<T> and call .compareTo.  All but 2 places just use this default...

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056500#comment-13056500 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

Thanks Uwe and Robert -- good catch!  Scary.

Maybe, to better test TopDocs.merge, we could fix newSearcher (AssertingIndexSearcher) to, sometimes, break into N searches on the sub-searchers and then merge the results, from its search methods?

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191-3x.patch

Patch for merging back to 3.x.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047877#comment-13047877 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

Thanks Uwe; I'll fix that.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056339#comment-13056339 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

I haven't tracked this issue to understand whats going on here, but you cannot pass BytesRefs to collator.compare. If this code is ever reached (and looking at the test i wrote for this damn thing, its unclear if this code is even necessary?!)

The code is necessary once you cross segments during compare. Maybe the index in the tests is so small that it never has multiple segments.

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056631#comment-13056631 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

I added this to the TestIdeas wiki page.


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3191:
----------------------------------

    Comment: was deleted

(was: This code was never tested, else we would have got this exception already? Let's remove this thing.

To fix, it should be enough to do a slow collator.compare(first.utf8ToString(), second.utf8ToString())?)

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056632#comment-13056632 ] 

Michael McCandless commented on LUCENE-3191:
--------------------------------------------

This can be done w/o breaking scores -- we can create a top-level weight, and then search against the subs (this is how TestTopDocs works).

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191-SlowCollatorCompareFix.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3191:
---------------------------------------

    Attachment: LUCENE-3191.patch

New patch:

  * Changes .value from Comparator (which is trappy because you think
    you're free to .compareTo them) to parameterized type passed to
    FieldComparator.

  * Renames .compare -> .compareValues, which are now type checked w/
    generic.

  * Changes FieldDoc.fields from Comparable[] to Object[]

Will need to work out how we backport this to 3.x; the change from
Comparable to Object is an API break, though... maybe not many apps
are using FieldDoc.field.


> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056331#comment-13056331 ] 

Uwe Schindler commented on LUCENE-3191:
---------------------------------------

This code was never tested, else we would have got this exception already? Let's remove this thing.

To fix, it should be enough to do a slow collator.compare(first.utf8ToString(), second.utf8ToString())?

> Add TopDocs.merge to merge multiple TopDocs
> -------------------------------------------
>
>                 Key: LUCENE-3191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3191
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org