You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/16 15:34:01 UTC

[jira] [Commented] (IMPALA-3234) Catalog should send incremental metadata changes to Impalads

    [ https://issues.apache.org/jira/browse/IMPALA-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136739#comment-17136739 ] 

ASF subversion and git services commented on IMPALA-3234:
---------------------------------------------------------

Commit 419aa2e30db326f02e9b4ec563ef7864e82df86e in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=419aa2e ]

IMPALA-9778: Refactor partition modifications in DDL/DMLs

After this patch, in DDL/DMLs that update metadata of partitions,
instead of updating partitions in place, we always create new ones and
use them to replace the existing instances. This is guarded by making
HdfsPartition immutable. There are several benefits for this:
 - HdfsPartition can be shared across table versions. In full catalog
   update mode, catalog update can ignore unchanged partitions
   (IMPALA-3234) and send the update in partition granularity.
 - Aborted DDL/DMLs won't leave partition metadata in a bad shape (e.g.
   IMPALA-8406), which usually requires invalidation to recover.
 - Fetch-on-demand coordinators can cache partition meta using the
   partition id as the key. When table version updates, only metadata of
   changed partitions need to be reloaded (IMPALA-7533).
 - In the work of decoupling partitions from tables (IMPALA-3127), we
   don't need to assign a catalog version to partitions since the
   partition ids already identify the partitions.

However, HdfsPartition is not strictly immutable. Although all its
fields are final, some fields are still referencing mutable objects. We
need more refactoring to achieve this. This patch focuses on refactoring
the DDL/DML code paths.

Changes:
 - Make all fields of HdfsPartition final. Move
   HdfsPartition constructor logics and all its update methods into
   HdfsPartition.Builder.
 - Refactor in-place updates on HdfsPartition to be creating a new one
   and dropping the old one. HdfsPartition.Builder represents the
   in-progress modifications. Once all modifications are done, call its
   build() method to create the new HdfsPartition instance. The old
   HdfsPartition instance is only replaced at the end of the
   modifications.
 - Move the "dirty" marker of HdfsPartition into a map of HdfsTable. It
   maps from the old partition id to the in-progress partition builder.
   For "dirty" partitions, we’ll reload its HMS meta and file meta.

Tests:
 - No new tests are added since the existing tests already provide
   sufficient coverage
 - Run CORE tests

Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c
Reviewed-on: http://gerrit.cloudera.org:8080/15985
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Catalog should send incremental metadata changes to Impalads
> ------------------------------------------------------------
>
>                 Key: IMPALA-3234
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3234
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.5.0
>            Reporter: Dimitris Tsirogiannis
>            Priority: Major
>              Labels: catalog-server
>
> Currently, every table metadata change (DDL/DML operation) causes the catalog to serialize the entire table metadata and send it to: a) the impalad node that triggered the operation and b) all the impalad nodes via a statestore update. The catalog should instead send only the portion of the table metadata that changed. Furthermore, certain operations, like DESCRIBE statements, don't require the full table metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org