You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2022/05/12 08:38:00 UTC

[jira] [Resolved] (LUCENE-10536) Doc values terms dicts should use the first term of each block as a dictionary

     [ https://issues.apache.org/jira/browse/LUCENE-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand resolved LUCENE-10536.
-----------------------------------
    Fix Version/s: 9.2
       Resolution: Fixed

> Doc values terms dicts should use the first term of each block as a dictionary
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-10536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10536
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 9.2
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Doc values terms dictionaries split data into blocks of 64 terms, where the first term is written uncompressed (which is useful for binary searches), and the 63 other terms are encoded by taking the difference with the previous term and compressing all suffixes together with LZ4.
> With this format, the suffix of the second term is also unlikely to benefit from any compression, since it doesn't have data to search for duplicate bytes into besides itself. A minor improvement we could make would consist of using the first term as a dictionary for suffixes of terms 2..64.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org