You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2016/05/24 13:25:12 UTC

[jira] [Updated] (LUCENE-7299) BytesRefHash.sort() should use radix sort?

     [ https://issues.apache.org/jira/browse/LUCENE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-7299:
---------------------------------
    Attachment: LUCENE-7299.patch

Here is a patch. I ran a simple benchmark with 10M docs, a single thread, a single indexed field and a largish ram buffer of 500 MB. On random binary keys of length 16, indexing speed improved by 37%. On random ascii keys whose length is between 0 and 16, indexing speed improved by 34%.

This is not a realistic speedup since there is no merging involved (the ram buffer can hold everything), no stored fields, no doc values, etc. - but I think this is still interesting to be able to generate the inverted index more quickly for flushed segments. There will probably be a noticeable speedup with some setups that make heavy use of the inverted index of high-cardinality fields with large ram buffers.

> BytesRefHash.sort() should use radix sort?
> ------------------------------------------
>
>                 Key: LUCENE-7299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7299.patch
>
>
> Switching DocIdSetBuilder to radix sort helped make things significantly faster. We should be able to do the same with BytesRefHash.sort()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org