You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2018/03/16 19:15:00 UTC

[jira] [Resolved] (IMPALA-6675) Change default configuration to --compact_catalog_topic=true

     [ https://issues.apache.org/jira/browse/IMPALA-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Behm resolved IMPALA-6675.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

commit 2f508183cb4dfefe1c67bf8c1988f350700763fe
Author: Alex Behm <al...@cloudera.com>
Date:   Thu Mar 15 09:57:10 2018 -0700

    IMPALA-6675: Default to --compact_catalog_topic=true.
    
    Testing:
    - Ran a few queries locally
    - Ran test_compact_catalog_updates.py locally
    
    Mostafa's perf evaluation:
    - 130 node cluster
    - Load metadata after invalidate for 4 tables, each
      with 100K partitions and 1 million files
    
    Results compaction on vs. compaction off
    - 5.7x reduction in topic size and Network
    - 30% reduction in Catalog+Statestore CPU
    - 15% speedup in query time
    - Compaction of topic takes ~22s in the Catalog
    - Time spent by Statestore sending the topics is
      reduced from 90s to 17s
    - Max topic update duration reduced from 72s to 11s
    
    Change-Id: I39a2dd42a21ef448b85278a8cef3c1d0112b844f
    Reviewed-on: http://gerrit.cloudera.org:8080/9661
    Reviewed-by: Alex Behm <al...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Change default configuration to --compact_catalog_topic=true
> ------------------------------------------------------------
>
>                 Key: IMPALA-6675
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6675
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Behm
>            Priority: Major
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> The catalog metadata can become large and lead to excessive network traffic due to dissemination via the statestore. The --compact_catalog_topic flag was introduced to mitigate this issue by compressing the catalog topic entries to reduce their serialized size.
> This saves network bandwidth at the cost of a small quantity of CPU time.
> To improve the out-of-the box experience of users we should enable this flag by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)