You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mayya Sharipova (Jira)" <ji...@apache.org> on 2021/06/23 13:39:03 UTC

[jira] [Closed] (LUCENE-9663) Adding compression to terms dict from SortedSet/Sorted DocValues

     [ https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayya Sharipova closed LUCENE-9663.
-----------------------------------

Closing after the 8.9.0 release

> Adding compression to terms dict from SortedSet/Sorted DocValues
> ----------------------------------------------------------------
>
>                 Key: LUCENE-9663
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9663
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Jaison.Bi
>            Priority: Trivial
>             Fix For: 8.9
>
>          Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch keyword field uses SortedSet DocValues. In our applications, “keyword” is the most frequently used field type.
>  LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do better by replacing prefix-compression with LZ4. In one of our application, the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB).
>  I've done simple tests based on the real application data, comparing the write/merge time cost, and the on-disk *.dvd file size(after merge into 1 segment).
> || ||Before||After||
> |Write time cost(ms)|591972|618200|
> |Merge time cost(ms)|270661|294663|
> |*.dvd file size(GB)|1.95|1.15|
> This feature is only for the high-cardinality fields. 
>  I'm doing the benchmark test based on luceneutil. Will attach the report and patch after the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org