You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Pranay Singh (JIRA)" <ji...@apache.org> on 2018/01/11 17:56:00 UTC

[jira] [Resolved] (IMPALA-5522) Use tracked memory for DictDecoder and DictEncoder

     [ https://issues.apache.org/jira/browse/IMPALA-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pranay Singh resolved IMPALA-5522.
----------------------------------
    Resolution: Fixed

The issue has been fixed by the below checkin

IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Currently DictDecoder class and DictEncoder class uses std::vector
to store the tables mapping codeword to value and vice-versa. It is
hard to detect the memory usage by these tables when they becomes
very large, since this memory is not accounted by Impala's memory
mangement infrastructure.

This patch uses the memory tracker of HdfsScanner to track the memory used
by dictionary in DictDecoder class. Similary it uses memory tracker of
HdfsTableSink to track the memory used by dictionary in DictEncoder class.

Memory for the dictionary, stored as std::vector is still allocated
from std:allocator but the amount allocated is accounted by
introducing a counter which is incremented and decremented as the
memory is consumed and released by vector.

Testing
-------
Ran all the backend and end-end tests with no failures.

Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Reviewed-on: http://gerrit.cloudera.org:8080/8034
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Use tracked memory for DictDecoder and DictEncoder
> --------------------------------------------------
>
>                 Key: IMPALA-5522
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5522
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Assignee: Pranay Singh
>            Priority: Minor
>              Labels: ramp-up
>
> Currently the dictionary decoder and encoder use std::vectors to store the tables mapping codeword to value and vice-versa. This memory is not accounted for by Impala's memory management infrastructure.
> These dictionaries can be large enough that it is important to track the memory. E.g. a 40,000 entry dictionary with 8 byte entries uses 320kb of memory.
> We should either allocate the memory from MemPools, or track the memory directly with a MemTracker.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)