You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Alex Klibisz (Jira)" <ji...@apache.org> on 2020/06/13 02:26:00 UTC

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

    [ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134629#comment-17134629 ] 

Alex Klibisz commented on LUCENE-9378:
--------------------------------------

Hi, just as another datapoint:

I'm using BinaryDocValues to store vectors for this elasticsearch plugin: [https://github.com/alexklibisz/elastiknn, |https://github.com/alexklibisz/elastiknn] The usecase is actually very similar to what [~sokolov] described. I saw a large regression after switching from elasticsearch 7.6.x to 7.7.x, which introduces Lucene 8.5.0. 

For instance, here are two screenshots from visualvm running the same benchmark on 7.6.x and then 7.7.x.

7.7.x spends a lot more time in the `decompress` method, and actually overtakes the `sortedIntersectionCount`  method that was previously most expensive. 

!image-2020-06-12-22-18-48-919.png|width=732,height=50!

!image-2020-06-12-22-18-24-527.png!

Note that this is also comparing Oracle JDK 13 (7.6.x) to Oracle JDK 14 (7.7.x). As a sanity check, I benchmarked the sortedIntersectionCount independently and it did get faster after the JDK switch.

I can provide more detailed info if necessary.

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>         Attachments: image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org