You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2016/06/09 04:11:21 UTC
[jira] [Assigned] (KUDU-1398) CFile index blocks can store shortest
separating prefix
[ https://issues.apache.org/jira/browse/KUDU-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Berkeley reassigned KUDU-1398:
-----------------------------------
Assignee: Will Berkeley
> CFile index blocks can store shortest separating prefix
> -------------------------------------------------------
>
> Key: KUDU-1398
> URL: https://issues.apache.org/jira/browse/KUDU-1398
> Project: Kudu
> Issue Type: Bug
> Components: cfile, perf
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Assignee: Will Berkeley
>
> Currently, the cfile value index blocks store the entire value for the first value in each data block. This is actually not necessary -- we only need to store the shortest string that falls between the last key of the previous block and the first key of this block. For example:
> Data block 1: apple,banana,cardamom
> Data block 2: carrot,epazote,fennel
> Today we would store:
> Index block entries: ['apple' -> block 1, 'carrot' -> block 2]
> Minimally, we can store:
> Index block entries: ['' -> block 1, 'care' -> block 2]
> In this example only a few bytes are saved, but in the case of longer key strings, the savings can be substantial. For example, if the key is a 36-byte UUID uniformly distributed, and we have 1000x32KB data blocks in a 32MB cfile, we can probably shorten the index entries to only 2-3 bytes on average for a big savings.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)