You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/06/19 01:43:00 UTC

[jira] [Created] (LUCENE-6585) Make ConjunctionDISI flatten sub ConjunctionDISI instances

Adrien Grand created LUCENE-6585:
------------------------------------

             Summary: Make ConjunctionDISI flatten sub ConjunctionDISI instances
                 Key: LUCENE-6585
                 URL: https://issues.apache.org/jira/browse/LUCENE-6585
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand
            Priority: Minor


Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to improve it by flattening sub iterators when they implement ConjunctionDISI. In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" (only in terms of matching, scoring would not change).

My motivation for this is that if we don't flatten and are unlucky, we can sometimes hit some worst cases. For instance consider the 3 following postings lists (sorted by increasing cost):

A: 1, 1001, 2001, 3001, ...
C: 0, 2, 4, 6, 8, 10, 12, 14, ...
B: 1, 3, 5, 7, 9, 11, 13, 15, ...

If we run "+A +B +C", then everything works fine, we use A as a lead, and advance B 1000 by 1000 to find the next match (if any).

However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the entire doc ID space when trying to find the first match which occurs on or after A:1.

This is an extreme example which is unlikely to happen in practice, but flattening would also help a bit on some more common cases. For instance imagine that A, B and C have respective costs of 100, 10 and 1000. If you search for "+A +(+B +C)", then we will use the most costly iterator (C) to confirm matches of B (the least costly iterator, used as a lead) while it would have been more efficient to confirm matches of B with A first, since A is less costly than C.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org