You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2020/11/06 16:14:00 UTC

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

    [ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227469#comment-17227469 ] 

Michael McCandless commented on LUCENE-9378:
--------------------------------------------

{quote}I'm wondering if there is anything I should know about if/how it would be a bad idea to continue using the `Lucene70DocValuesFormat` until this issue is resolved?
{quote}
Just be aware that you don't have the backwards compatibility guarantee you would normally have.

E.g. in the future when you upgrade to Lucene 9.x release, it will not be able to read these indices you are creating with {{Lucene70DocValuesFormat}}.

Also, few people are fixing bugs in that format, though it is/was widely used so is likely nearly bug free ;)

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>         Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org