You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/07 14:08:00 UTC

[jira] [Commented] (IMPALA-9664) Support Hive replication for ACID tables

    [ https://issues.apache.org/jira/browse/IMPALA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209566#comment-17209566 ] 

ASF subversion and git services commented on IMPALA-9664:
---------------------------------------------------------

Commit d6c664ef71d7d338d2a3517d90b76c3606933ae3 in impala's branch refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d6c664e ]

IMPALA-9664: Fix typo in test_event_processing.py

The test tries to add src_db within a database
object which would fail when the database
managed location is present. This test doesn't
fail currently in ASF master since the database
doesn't have managed location yet. It will start
failing once the managed location for databases is
available in the toolchain build of Hive.

Testing:
1. The test was working before the patch since the
managed db location was probably not set and modified
line was not getting executed. I made sure the test works
with the patch as well.
2. I applied the patch in an environment where
managed db location is available and the error disappears.
(Although the test fails for another unrelated reason
HIVE-23995) so we should be aware of this when the toolchain
hive build is bumped up.

Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3
Reviewed-on: http://gerrit.cloudera.org:8080/16547
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Support Hive replication for ACID tables
> ----------------------------------------
>
>                 Key: IMPALA-9664
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9664
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Critical
>             Fix For: Impala 4.0
>
>
> According to what we see in Hive source code, for transactional tables, the insert events are fired with a different API {{addWriteNotificationLog}}. Currently Impala fires {{firelistenerEvent}} for both transactional and non-transactional tables. We should look at what is the difference between the two APIs and see if we need to handle transactional tables differently.
> References:
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2402
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2236
> These insert events are used to replicate the changes in the ACID tables by tools like Hive replication. With the ability of insert data into ACID tables from Impala, we should also generate the insert events appropriately so that the replication works seemlessly. Additionally, the {{truncate table}} command should use the HMS API to truncate the table instead of deleteing the files directly from filesystem since it takes care of moving the files to a replication change management directory so that replication can have access to dropped data files.
> Note that for external tables, Hive replication doesn't need to keep track of the files. It only replicates the table metadata based on events and the data files are "distcped" to the target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org