You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gautam Worah (Jira)" <ji...@apache.org> on 2020/11/17 18:50:00 UTC

[jira] [Comment Edited] (LUCENE-9378) Configurable compression for BinaryDocValues

    [ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233826#comment-17233826 ] 

Gautam Worah edited comment on LUCENE-9378 at 11/17/20, 6:49 PM:
-----------------------------------------------------------------

Thanks for catching that regression so quickly [~jpountz] !

I had recently made a change in [LUCENE-9450|https://github.com/apache/lucene-solr/pull/1733/] that had switched taxonomy facets to use BinaryDocValues from StoredFields and that could have caused the drop in performance. Benchmark runs for my change had shown a 4.7% improvement in {{BrowseDayOfYearTaxoFacets}} but I had also made some changes after the benchmarks were run.

Meanwhile, I'll open an issue to disable compression for BinaryDocValues in faceting operations on Lucene's nightly benchmarks!

 


was (Author: gworah):
Thanks for catching that regression so quickly Adrien!

I had recently made a change in [LUCENE-9450|https://github.com/apache/lucene-solr/pull/1733/] that had switched taxonomy facets to use BinaryDocValues from StoredFields and that could have caused the drop in performance. [Benchmark runs|https://issues.apache.org/jira/browse/LUCENE-9450?focusedCommentId=17180860&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17180860] for my change had shown a 4.7% improvement in {{BrowseDayOfYearTaxoFacets}} but I had also made some changes after the benchmarks were run.

Meanwhile, I'll open an issue to disable compression for BinaryDocValues in faceting operations on Lucene's nightly benchmarks!

 

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Major
>             Fix For: 8.8
>
>         Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org