You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Greg Miller (Jira)" <ji...@apache.org> on 2022/03/04 14:23:00 UTC

[jira] [Comment Edited] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

    [ https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500898#comment-17500898 ] 

Greg Miller edited comment on LUCENE-10302 at 3/4/22, 2:22 PM:
---------------------------------------------------------------

Thanks [~dsmiley]. I'd sketched something out as well and have it sitting over here on a branch: [https://github.com/gsmiller/lucene/tree/LUCENE-10302-pq-builder-sketch|https://github.com/gsmiller/lucene/tree/LUCENE-10302-pq-builder-sketch.] I'll have a look at your patch file and see where ideas are similar/different.

 

[~vigyas] sounds good!


was (Author: gsmiller):
Thanks [~dsmiley]. I'd sketched something out as well and have it sitting over here on a branch: [https://github.com/gsmiller/lucene/tree/LUCENE-10302-pq-builder-sketch.] I'll have a look at your patch file and see where ideas are similar/different.

 

[~vigyas] sounds good!

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-10302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10302
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue when we provide a bulk array to initialize the heap/PriorityQueue.  It turns out there is: the JDK's PriorityQueue supports this in its constructors, referring to "This classic algorithm due to Floyd (1964) is known to be O(size)" -- heapify() method.  There's [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may or may not be the same; I didn't look too closely yet.  I see a number of uses of Lucene's PriorityQueue that first collects values and only after collecting want to do something with the results (typical / unsurprising).  This lends itself to a builder pattern that can look similar to LargeNumHitsTopDocsCollector in terms of first having an array used like a list and then move over to the PriorityQueue if/when it gets full (it may not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org