You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Emmanuel Keller (JIRA)" <ji...@apache.org> on 2017/01/07 10:47:58 UTC
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways
implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807272#comment-15807272 ]
Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:47 AM:
------------------------------------------------------------------
The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents.
As we are using a multithreaded map/reduce design we can't expect that the order will be preserved.
[~mikemccand] am I right ?
IMHO, the equality check must be modified to only check that the document are present with the same score.
Here is the current check test for the ScoreDoc array:
{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
if (VERBOSE) {
System.out.println(" hit " + i + " expected=" + expected.hits.get(i).id);
}
assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id"));
// Score should be IDENTICAL:
assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f);
}
{code}
was (Author: ekeller):
The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents.
As we are using a multithreaded map/reduce design we can't expect that the order will be preserved.
[~mikemccand] am I right ?
IMHO, the equality check must be modified to only check that the document are present with the same score.
{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
if (VERBOSE) {
System.out.println(" hit " + i + " expected=" + expected.hits.get(i).id);
}
assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id"));
// Score should be IDENTICAL:
assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f);
}
{code}
> A parallel DrillSideways implementation
> ---------------------------------------
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
> Issue Type: Improvement
> Affects Versions: master (7.0), 6.3.1
> Reporter: Emmanuel Keller
> Priority: Minor
> Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, CollectorManager collectorManager) to get the benefits of multithreading on index segments,
> 2. Compute each DrillSideway subquery on a single thread.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org