You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2016/06/23 23:53:16 UTC

[jira] [Resolved] (KUDU-1398) CFile index blocks can store shortest separating prefix

     [ https://issues.apache.org/jira/browse/KUDU-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Berkeley resolved KUDU-1398.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.0.0

> CFile index blocks can store shortest separating prefix
> -------------------------------------------------------
>
>                 Key: KUDU-1398
>                 URL: https://issues.apache.org/jira/browse/KUDU-1398
>             Project: Kudu
>          Issue Type: Bug
>          Components: cfile, perf
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Will Berkeley
>             Fix For: 1.0.0
>
>
> Currently, the cfile value index blocks store the entire value for the first value in each data block. This is actually not necessary -- we only need to store the shortest string that falls between the last key of the previous block and the first key of this block. For example:
> Data block 1: apple,banana,cardamom
> Data block 2: carrot,epazote,fennel
> Today we would store:
> Index block entries: ['apple' -> block 1, 'carrot' -> block 2]
> Minimally, we can store:
> Index block entries: ['' -> block 1, 'care' -> block 2]
> In this example only a few bytes are saved, but in the case of longer key strings, the savings can be substantial. For example, if the key is a 36-byte UUID uniformly distributed, and we have 1000x32KB data blocks in a 32MB cfile, we can probably shorten the index entries to only 2-3 bytes on average for a big savings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)