You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2018/06/08 18:52:00 UTC

[jira] [Comment Edited] (IMPALA-5990) End-to-end compression of metadata

    [ https://issues.apache.org/jira/browse/IMPALA-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506446#comment-16506446 ] 

Tianyi Wang edited comment on IMPALA-5990 at 6/8/18 6:51 PM:
-------------------------------------------------------------

Today I learned that a thrift message larger than 4GB can be used with TBufferedTransport and TBinaryProtocol at least in C++. The limits are at other places: TMemoryBuffer cannot handle a message larger than 4GB, thrift cannot handle a single std::string larger than 4GB, etc.

So after IMPALA-5990, we have seen ~6GB compressed catalog and it works just fine.


was (Author: tianyiwang):
Today I learned that a thrift message larger than 4GB can be used with TBufferedTransport and TBinaryProtocol. The limits are at other places: TMemoryBuffer cannot handle a message larger than 4GB, thrift cannot handle a single std::string larger than 4GB, etc.

So after IMPALA-5990, we have seen ~6GB compressed catalog and it works just fine.

> End-to-end compression of metadata
> ----------------------------------
>
>                 Key: IMPALA-5990
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5990
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: Alexander Behm
>            Assignee: Tianyi Wang
>            Priority: Critical
>             Fix For: Impala 2.12.0
>
>
> The metadata of large tables can become quite big making it costly to hold in the statestore and disseminate to coordinator impalads. The metadata can even get so big that fundamental limits like the JVM 2GB array size and the Thrift 4GB are hit and lead to downtime.
> For reducing the statestore metadata topic size we have an existing "compact_catalog_topic" flag which LZ4 compresses the metadata payload for the C++ codepaths catalogd->statestore and statestore->impalad.
> Unfortunately, the metadata is not compressed in the same way during the FE->BE transition on the catalogd and the BE->FE transition on the impalad.
> The goal of this change is to enable end-to-end compression for the full path of metadata dissemination. The existing code paths also need significant cleanup/streamlining. Ideally, the new code should provide consistent size limits everywhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org