You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Emmanuel Keller (JIRA)" <ji...@apache.org> on 2017/01/07 10:47:58 UTC

[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

    [ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807272#comment-15807272 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:47 AM:
------------------------------------------------------------------

The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are present with the same score.  

Here is the current check test for the ScoreDoc array:

{code:java}
    for (int i = 0; i < expected.hits.size(); i++) {
      if (VERBOSE) {
        System.out.println("    hit " + i + " expected=" + expected.hits.get(i).id);
      }
      assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id"));
      // Score should be IDENTICAL:
      assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f);
    }
{code}


was (Author: ekeller):
The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are present with the same score.  

{code:java}
    for (int i = 0; i < expected.hits.size(); i++) {
      if (VERBOSE) {
        System.out.println("    hit " + i + " expected=" + expected.hits.get(i).id);
      }
      assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id"));
      // Score should be IDENTICAL:
      assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f);
    }
{code}

> A parallel DrillSideways implementation
> ---------------------------------------
>
>                 Key: LUCENE-7588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7588
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0), 6.3.1
>            Reporter: Emmanuel Keller
>            Priority: Minor
>              Labels: facet, faceting
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, CollectorManager collectorManager)  to get the benefits of multithreading on index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org