You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2018/03/16 19:15:00 UTC
[jira] [Resolved] (IMPALA-6675) Change default configuration to
--compact_catalog_topic=true
[ https://issues.apache.org/jira/browse/IMPALA-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm resolved IMPALA-6675.
------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
Impala 3.0
commit 2f508183cb4dfefe1c67bf8c1988f350700763fe
Author: Alex Behm <al...@cloudera.com>
Date: Thu Mar 15 09:57:10 2018 -0700
IMPALA-6675: Default to --compact_catalog_topic=true.
Testing:
- Ran a few queries locally
- Ran test_compact_catalog_updates.py locally
Mostafa's perf evaluation:
- 130 node cluster
- Load metadata after invalidate for 4 tables, each
with 100K partitions and 1 million files
Results compaction on vs. compaction off
- 5.7x reduction in topic size and Network
- 30% reduction in Catalog+Statestore CPU
- 15% speedup in query time
- Compaction of topic takes ~22s in the Catalog
- Time spent by Statestore sending the topics is
reduced from 90s to 17s
- Max topic update duration reduced from 72s to 11s
Change-Id: I39a2dd42a21ef448b85278a8cef3c1d0112b844f
Reviewed-on: http://gerrit.cloudera.org:8080/9661
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
> Change default configuration to --compact_catalog_topic=true
> ------------------------------------------------------------
>
> Key: IMPALA-6675
> URL: https://issues.apache.org/jira/browse/IMPALA-6675
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.11.0
> Reporter: Alexander Behm
> Assignee: Alexander Behm
> Priority: Major
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> The catalog metadata can become large and lead to excessive network traffic due to dissemination via the statestore. The --compact_catalog_topic flag was introduced to mitigate this issue by compressing the catalog topic entries to reduce their serialized size.
> This saves network bandwidth at the cost of a small quantity of CPU time.
> To improve the out-of-the box experience of users we should enable this flag by default.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)