You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/05 12:37:01 UTC
[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

     [ https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354257 ]

ASF GitHub Bot logged work on HIVE-21213:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/19 12:36
            Start Date: 05/Dec/19 12:36
    Worklog Time Spent: 10m 
      Work Description: ashutosh-bapat commented on pull request #587: HIVE-21213 : Acid table bootstrap replication needs to handle directory created by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354286550
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##########
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
     replica.run("drop database " + dbName1 + " cascade");
     replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType compactionType) throws Throwable {
+    HiveConf hiveConf = new HiveConf(primary.getConf());
+    TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+    txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+    hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+    runWorker(hiveConf);
+    runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, String table) throws Throwable {
+    Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+    FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+    return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+     String tableName = testName.getMethodName();
+     primary.run("use " + primaryDbName)
+            .run("create table " + tableName + " (id int) clustered by(id) into 3 buckets stored as orc " +
+                    "tblproperties (\"transactional\"=\"true\")")
 
 Review comment:
   Should we add a test for partitioned table as well?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 354257)

> Acid table bootstrap replication needs to handle directory created by compaction with txn id
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21213
>                 URL: https://issues.apache.org/jira/browse/HIVE-21213
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, HiveServer2, repl
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, HIVE-21213.03.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory name. This is used to isolate the queries from reading the directory until compaction has finished and to avoid the compactor marking used earlier. In case of replication, during bootstrap , directory is copied as it is with the same name from source to destination cluster. But the directory created by compaction with txn id can not be copied as the txn list at target may be different from source. The txn id which is valid at source may be an aborted txn at target. So conversion logic is required to create a new directory with valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)