You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/03/14 15:03:33 UTC

[jira] [Created] (LUCENE-7101) OfflineSorter's merging is O(N^2) cost for large sorts

Michael McCandless created LUCENE-7101:
------------------------------------------

             Summary: OfflineSorter's merging is O(N^2) cost for large sorts
                 Key: LUCENE-7101
                 URL: https://issues.apache.org/jira/browse/LUCENE-7101
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Blocker
             Fix For: master, 6.0


Our {{OfflineSorter}} acts just like Lucene, writing small initial
segments of sorted values (from what it was able to sort at once in
heap), periodically merging them when there are too many, and doing a
{{forceMerge(1)}} in the end.

But the merge logic is too simplistic today, resulting in O(N^2)
cost.  Smallish sorts usually won't hit it, because the default 128
merge factor is so high, but e.g. the new 2B points tests do hit the
N^2 behavior.  I suspect the high merge factor hurts performance (OS
struggles to use what free RAM it has to read-ahead on 128 files), and
also risks file descriptor exhaustion.

I think we should implement a simple log merge policy for it, and drop
its default merge factor to 10.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org