You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/12/23 22:21:00 UTC

[jira] [Resolved] (IMPALA-340) Improve internal format of strings

     [ https://issues.apache.org/jira/browse/IMPALA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-340.
----------------------------------
    Resolution: Later

Not a bad idea but don't need to track this separately.

> Improve internal format of strings
> ----------------------------------
>
>                 Key: IMPALA-340
>                 URL: https://issues.apache.org/jira/browse/IMPALA-340
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 1.0
>            Reporter: Nong Li
>            Priority: Minor
>              Labels: perfomance
>
> We currently store string data outside of a Tuple, with the string slot taking up 8 bytes (4 bytes length, 8 bytes pointer, -4 bytes padding- (UPDATE: IMPALA-7367 removed the padding)), which is hugely wasteful.
> We need 2 improvements:
> a more compact string slot: Intel architectures only use 48 bits of a 64-bit address; strings are usually smaller than 64K; if the latter holds, we should pack a string slot into 64 bits total
> in-line representation of strings: schemas we've seen often use strings as ids (which then also show up as foreign keys and are used heavily in joins), and those are typically smaller than 8 bytes; in that case, we could simply store the actual data in the string slot itself
> See benchmarks/string-benchmark.cc.
> See IMP-148 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)