You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/07/28 08:00:28 UTC

[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

    [ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166239#comment-17166239 ] 

ASF subversion and git services commented on IMPALA-3127:
---------------------------------------------------------

Commit 074731e2bcf37643710f2fdf236829991a462fc3 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=074731e ]

IMPALA-3127: Support incremental metadata updates in partition level

Currently, partitions are tightly integrated into the HdfsTable objects.
Catalogd has to transmit the entire table metadata even when few
partitions change. This is a waste of resources and can lead to OOM in
transmitting large tables due to the 2GB JVM array limit.

This patch makes HdfsPartition extend CatalogObject so the catalogd can
send partitions as individual catalog objects. Consequently, table
objects in the catalog topic update can have minimal partition maps that
only contain the partition ids, which reduces the thrift object size for
large tables. The catalog object key of HdfsPartition consists of db
name, table name and partition name.

In "full" topic mode (catalog_topic_mode=full), catalogd only sends
changed partitions with their latest table states. The latest table
states are table objects with the minimal partition map. Legacy
coordinators use the partition list to pick up existing (unchanged)
partitions from the existing table object and new partitions in the
catalog update.

Currently, partition instances are immutable - all partition
modifications are implemented by deleting the old instance and adding a
new one with a new partition id. Since partition ids are generated by a
global counter. Newer partition instances will have larger partition
ids. So catalogd maintains a watermark for each table as the max sent
partition id. Partition instances with ids larger than this are new
partitions that should be sent in the next catalog update. For the
deleted partition instances, they are kept in a set for each table until
the next catalog update. If there are no updates on the same partition
name, catalogd will send deletion on the partition.

For dropped or invalidated tables, catalogd will still send deletions on
their partitions. Although they are not used in coordinators
(coordinators delete the partitions when they delete the table
instances), they help in avoiding topic entry leak in the statestore
catalog topic.

In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only
sends invalidations on tables and stale partition instances. Each
partition instance is identified by its partition id. LocalCatalog
coordinators use the partition invalidations to evict stale partitions
in time. For instance, let's say partition(year=2010) is updated in
catalogd. This is done by deleting the old partition instance
partition(id=0, year=2010) and adding a new partition instance
partition(id=1, year=2010). Catalogd will send invalidations on the
table and partition instance with id=0, but not the one with id=1. A
LocalCatalog coordinator will invalidate the partition instance(id=0) if
it's in the cache. If the partition instance(id=1) is cached, it's
already the latest version since partition instances are immutable. So
we don't need to invalidate it.

Tests
 - Run exhaustive tests.
 - Run exhaustive test_ddl.py in LocalCatalog mode.
 - Add test in test_local_catalog.py to verify stale partitions are
   invalidated in LocalCatalog when partitions are updated.

Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Reviewed-on: http://gerrit.cloudera.org:8080/16159
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>


> Decouple partitions from tables
> -------------------------------
>
>                 Key: IMPALA-3127
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3127
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.2.4
>            Reporter: Dimitris Tsirogiannis
>            Assignee: Quanlong Huang
>            Priority: Major
>              Labels: catalog-server, performance
>
> Currently, partitions are tightly integrated into the HdfsTable objects, making incremental metadata updates difficult to perform. Furthermore, the catalog transmits entire table metadata even when only few partitions change, introducing significant latencies, wasting network bandwidth and CPU cycles while updating table metadata at the receiving impalads. As a first step, we should decouple partitions from tables and add them as a separate level in the hierarchy of catalog entities (server-db-table-partition). Subsequently, the catalog should transmit only entities that have changed after DDL/DML statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org