You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dimitris Tsirogiannis (JIRA)" <ji...@apache.org> on 2017/05/10 17:16:04 UTC

[jira] [Resolved] (IMPALA-4029) Reduce memory requirements for storing THdfsFileDesc

     [ https://issues.apache.org/jira/browse/IMPALA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dimitris Tsirogiannis resolved IMPALA-4029.
-------------------------------------------
    Resolution: Fixed

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Reviewed-on: http://gerrit.cloudera.org:8080/6406
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
14 files changed, 572 insertions(+), 323 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved

> Reduce memory requirements for storing THdfsFileDesc
> ----------------------------------------------------
>
>                 Key: IMPALA-4029
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4029
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>    Affects Versions: Impala 2.7.0
>            Reporter: Dimitris Tsirogiannis
>            Assignee: Dimitris Tsirogiannis
>            Priority: Critical
>              Labels: catalog-server, performance
>
> The memory representation of Hdfs files in the catalog is highly inefficient and can be significantly improved. Currently, the Catalog uses ~400-500 bytes per THdfsFileDescriptor object which essentially includes: a) the file name and b) a list of THdfsFileBlocks. Every file block stores information about replicas, disks ids and whether the replica is cached or not. All that information is currently stored in Thrift objects and can be significantly compressed. 
> Also, the catalog and the Impalad services spend a lot of time (and memory) serializing/deserializing Thrift objects. Using a more efficient serialization library (e.g. FlatBufffers) can significantly improve memory efficiency and speed while processing catalog updates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)