You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/08/06 16:50:09 UTC

[GitHub] [hive] ashish-kumar-sharma opened a new pull request #1370: HIVE-23887: Reset Columns stats in Export Statement

ashish-kumar-sharma opened a new pull request #1370:
URL: https://github.com/apache/hive/pull/1370


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r471526397



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {

Review comment:
       Can we also check if import sets stats to true in case of autostats_gather is set to true? For both partitioned and non-partitioned tables. Also check the values of stats listed in StatsSetupConst.SUPPORTED_STATS (atleast ROWNUM should match) in both source and target tables and they should match.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh merged pull request #1370: HIVE-23887: Reset table level basic/column stats during import

Posted by GitBox <gi...@apache.org>.
sankarh merged pull request #1370:
URL: https://github.com/apache/hive/pull/1370


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r471526397



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {

Review comment:
       Can we also check if import sets stats to true in case of autostats_gather is set to true? For both partitioned and non-partitioned tables. Also check the values of stats listed in StatsSetupConst.SUPPORTED_STATS in both source and target tables and they should match.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r467462271



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/PartitionSerializer.java
##########
@@ -59,6 +61,11 @@ public void writeTo(JsonWriter writer, ReplicationSpec additionalPropertiesProvi
                   additionalPropertiesProvider.getCurrentReplicationState());
         }
       }
+
+      if (parameters != null && parameters.containsKey(COLUMN_STATS_ACCURATE)) {

Review comment:
       There is already parameters != null check above in this method. Can we move this code there itself?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r472338632



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2452,9 +2448,16 @@ private Partition loadPartitionInternal(Path loadPath, Table tbl, Map<String, St
         skewedInfo.setSkewedColValueLocationMaps(skewedColValueLocationMaps);
         newCreatedTpart.getSd().setSkewedInfo(skewedInfo);
       }
-      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+
+      //if hive.stats.autogather = false or resetStatistics = true then

Review comment:
       The comment can answer the "why" rather than "what". The below if statement is self explanatory.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -389,21 +390,56 @@ public void testImportPartitioned() throws Exception {
 
   @Test
   public void testImportPartitionedOrc() throws Exception {
+    //clear and drop table T,Tstage
     runStatementOnDriver("drop table if exists T");
     runStatementOnDriver("drop table if exists Tstage");
-    runStatementOnDriver("create table T (a int, b int) partitioned by (p int) stored" +
-        " as orc tblproperties('transactional'='true')");
-    //Tstage is the target table
-    runStatementOnDriver("create table Tstage (a int, b int) partitioned by (p int) stored" +
-        " as orc tblproperties('transactional'='true')");
+
+    //create target table - T
+    runStatementOnDriver("create table T (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+
+    //create source table - Tstage
+    runStatementOnDriver("create table Tstage (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+
     //this creates an ORC data file with correct schema under table root
     runStatementOnDriver("insert into Tstage values(1,2,10),(3,4,11),(5,6,12)");
-    final int[][] rows = {{3}};
-    //now we have an archive with 3 partitions
+    final int[][] rows = { { 3 } };
+
+    //check Partitions statistics
+    List<String> rsTstagePartitionsProperties = runStatementOnDriver("show partitions Tstage");
+    for (String rsTstagePartition : rsTstagePartitionsProperties) {
+      List<String> rsPartitionProperties =
+          runStatementOnDriver("describe formatted Tstage partition(" + rsTstagePartition + ")");
+      Assert.assertEquals("COLUMN_STATS_ACCURATE of partition " + rsTstagePartition + " of Tstage table", true,
+          rsPartitionProperties.contains("\tCOLUMN_STATS_ACCURATE\t{\\\"BASIC_STATS\\\":\\\"true\\\"}"));
+      Assert.assertEquals(" of partition " + rsTstagePartition + " of Tstage table", true,
+          rsPartitionProperties.contains("\tnumRows             \t1                   "));
+    }
+
+    //now we have an archive Tstage with 3 partitions
     runStatementOnDriver("export table Tstage to '" + getWarehouseDir() + "/1'");
 
     //load T
     runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+
+    //check basic stats in tblproperties of T
+    List<String> rsTProperties = runStatementOnDriver("show tblproperties T");
+    Assert.assertEquals("COLUMN_STATS_ACCURATE of T table", false,

Review comment:
       I think, stats are false as table T was already created. Can we check if it is set to false when import create the table.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2452,9 +2448,16 @@ private Partition loadPartitionInternal(Path loadPath, Table tbl, Map<String, St
         skewedInfo.setSkewedColValueLocationMaps(skewedColValueLocationMaps);
         newCreatedTpart.getSd().setSkewedInfo(skewedInfo);
       }
-      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+
+      //if hive.stats.autogather = false or resetStatistics = true then
+      //clear partition column stats and set basic stats to false

Review comment:
       Nit: Add single space after "//"

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2452,9 +2448,16 @@ private Partition loadPartitionInternal(Path loadPath, Table tbl, Map<String, St
         skewedInfo.setSkewedColValueLocationMaps(skewedColValueLocationMaps);
         newCreatedTpart.getSd().setSkewedInfo(skewedInfo);
       }
-      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+
+      //if hive.stats.autogather = false or resetStatistics = true then
+      //clear partition column stats and set basic stats to false
+      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER) || resetStatistics) {

Review comment:
       Shall check resetStatistics first as it take less cpu cycles in positive flow.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -70,6 +70,7 @@
 
 import com.google.common.collect.ImmutableList;
 
+import org.antlr.runtime.misc.Stats;

Review comment:
       Unused import.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +602,44 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+  @Test
+  public void testImportOrc() throws Exception {
+    //Clear and Drop T and Tstage if exist
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+
+    //create target table - T
+    runStatementOnDriver("create table T (a int, b int) stored" + " as orc tblproperties('transactional'='true')");
+
+    //Create source table - Tstage
+    runStatementOnDriver("create table Tstage (a int, b int) stored" + " as orc tblproperties('transactional'='true')");
+
+    //this creates an ORC data file with correct schema under table root
+    runStatementOnDriver("insert into Tstage values(1,2),(3,4),(5,6)");
+    final int[][] rows = { { 3 } };
+
+    //check Tstage statistics
+    List<String> rsTStageProperties = runStatementOnDriver("show tblproperties Tstage");
+    Assert.assertEquals("COLUMN_STATS_ACCURATE of Tstage table", true,
+        rsTStageProperties.contains("COLUMN_STATS_ACCURATE\t{\"BASIC_STATS\":\"true\"}"));
+    Assert.assertEquals("numRows of Tstage table", true, rsTStageProperties.contains("numRows\t3"));
+    Assert.assertEquals("numFiles of Tstage table", true, rsTStageProperties.contains("numFiles\t1"));
+
+    //now we have an archive Tstage table
+    runStatementOnDriver("export table Tstage to '" + getWarehouseDir() + "/1'");
+
+    //load T
+    runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+
+    //check basic stats in tblproperties T
+    List<String> rsTProperties = runStatementOnDriver("show tblproperties T");

Review comment:
       Can test the import with hive.stats.autogather=true and see if stats are accurate for both partitioned and non-partitioned tables.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2452,9 +2448,16 @@ private Partition loadPartitionInternal(Path loadPath, Table tbl, Map<String, St
         skewedInfo.setSkewedColValueLocationMaps(skewedColValueLocationMaps);
         newCreatedTpart.getSd().setSkewedInfo(skewedInfo);
       }
-      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+
+      //if hive.stats.autogather = false or resetStatistics = true then
+      //clear partition column stats and set basic stats to false
+      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER) || resetStatistics) {
+        LOG.debug(

Review comment:
       Nit: Use upper-case for first character of log message.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2452,9 +2448,16 @@ private Partition loadPartitionInternal(Path loadPath, Table tbl, Map<String, St
         skewedInfo.setSkewedColValueLocationMaps(skewedColValueLocationMaps);
         newCreatedTpart.getSd().setSkewedInfo(skewedInfo);
       }
-      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+
+      //if hive.stats.autogather = false or resetStatistics = true then
+      //clear partition column stats and set basic stats to false
+      if (!this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER) || resetStatistics) {
+        LOG.debug(
+            "clear partition column statistics and setting basic stats to false for " + newTPart.getCompleteName());
+        StatsSetupConst.clearColumnStatsState(newTPart.getParameters());

Review comment:
       clearColumnStatsState call is redundant as setBasicStatsState(false) will remove the column stats as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r467462298



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/TableSerializer.java
##########
@@ -96,6 +98,11 @@ private Table updatePropertiesInTable(Table table, ReplicationSpec additionalPro
       // uncomment this else section, but currently unneeded. Will require a lot of golden file
       // regen if we do so.
     }
+
+    if (parameters != null && parameters.containsKey(COLUMN_STATS_ACCURATE)) {

Review comment:
       There is already parameters != null check above in this method. Can we move this code there itself?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r467461932



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +567,28 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+  @Test public void testExportPartitionedOrcWithOutColumnStats() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table
+    runStatementOnDriver("create table Tstage (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //this creates an ORC data file with correct schema under table root
+    runStatementOnDriver("insert into Tstage values(1,2,10),(3,4,11),(5,6,12)");
+    final int[][] rows = { { 3 } };
+    //now we have an archive with 3 partitions
+    runStatementOnDriver("export table Tstage to '" + getWarehouseDir() + "/1'");

Review comment:
       Check the COLUMN_STATS_ACCURATE property is true on source table and partition before and after export operation to confirm we don't overwrite anything in their metadata.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +567,28 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+  @Test public void testExportPartitionedOrcWithOutColumnStats() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table
+    runStatementOnDriver("create table Tstage (a int, b int) partitioned by (p int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //this creates an ORC data file with correct schema under table root
+    runStatementOnDriver("insert into Tstage values(1,2,10),(3,4,11),(5,6,12)");
+    final int[][] rows = { { 3 } };
+    //now we have an archive with 3 partitions
+    runStatementOnDriver("export table Tstage to '" + getWarehouseDir() + "/1'");
+
+    //load T
+    runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+    List<String> rsProperties = runStatementOnDriver("show tblproperties T");

Review comment:
       Can we also check the partition properties?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1370: HIVE-23887:Reset Columns stats in Export Statement

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1370:
URL: https://github.com/apache/hive/pull/1370#discussion_r471511349



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+

Review comment:
       Nit: Remove extra blank line.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -3185,7 +3185,7 @@ public void loadTable(Path loadPath, String tableName, LoadFileType loadFileType
     //column stats will be inaccurate
     if (resetStatistics) {
       LOG.debug("Clearing table statistics for " + tbl.getDbName() + "." + tbl.getTableName());
-      StatsSetupConst.clearColumnStatsState(tbl.getParameters());
+      StatsSetupConst.setBasicStatsState(tbl.getParameters(),StatsSetupConst.FALSE);

Review comment:
       Even during loadPartition, we need to reset stats in table properties as well which is missing now.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table

Review comment:
       Nit: Add single blank line before a comment line. Check below places too.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -3185,7 +3185,7 @@ public void loadTable(Path loadPath, String tableName, LoadFileType loadFileType
     //column stats will be inaccurate
     if (resetStatistics) {
       LOG.debug("Clearing table statistics for " + tbl.getDbName() + "." + tbl.getTableName());
-      StatsSetupConst.clearColumnStatsState(tbl.getParameters());
+      StatsSetupConst.setBasicStatsState(tbl.getParameters(),StatsSetupConst.FALSE);

Review comment:
       Can be combined with previous if block which also does the same thing. Also, the comment says "column stats will be inaccurate" but we are resetting both basic and column stats. Correct the log message and comment accordingly.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table
+    runStatementOnDriver("create table Tstage (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //this creates an ORC data file with correct schema under table root
+    runStatementOnDriver("insert into Tstage values(1,2),(3,4),(5,6)");
+    final int[][] rows = { { 3 } };

Review comment:
       Can we check if stats are set in source table Tstage?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -3185,7 +3185,7 @@ public void loadTable(Path loadPath, String tableName, LoadFileType loadFileType
     //column stats will be inaccurate
     if (resetStatistics) {
       LOG.debug("Clearing table statistics for " + tbl.getDbName() + "." + tbl.getTableName());
-      StatsSetupConst.clearColumnStatsState(tbl.getParameters());
+      StatsSetupConst.setBasicStatsState(tbl.getParameters(),StatsSetupConst.FALSE);

Review comment:
       Nit: Need a space after comma ",".

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -404,6 +404,21 @@ public void testImportPartitionedOrc() throws Exception {
 
     //load T
     runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+
+    //check basic stats in tblproperties
+    List<String> rsProperties = runStatementOnDriver("show tblproperties T");
+    Assert
+        .assertEquals("COLUMN_STATS_ACCURATE of imported table", rsProperties.contains("COLUMN_STATS_ACCURATE"), false);
+
+    //check basic stats in partition properties
+    List<String> rsPartitions = runStatementOnDriver("show partitions T");
+    for(String rsPartition : rsPartitions) {
+      List<String> rsPartitionProperties = runStatementOnDriver("describe formatted T partition("+ rsPartition +")");

Review comment:
       Nit: Add space before and after "+".

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {

Review comment:
       Nit: The method definition and annotation can be in separate lines.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -404,6 +404,21 @@ public void testImportPartitionedOrc() throws Exception {
 
     //load T
     runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+
+    //check basic stats in tblproperties
+    List<String> rsProperties = runStatementOnDriver("show tblproperties T");
+    Assert
+        .assertEquals("COLUMN_STATS_ACCURATE of imported table", rsProperties.contains("COLUMN_STATS_ACCURATE"), false);
+
+    //check basic stats in partition properties
+    List<String> rsPartitions = runStatementOnDriver("show partitions T");
+    for(String rsPartition : rsPartitions) {

Review comment:
       Nit: Add space before "("

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {
+

Review comment:
       Nit: Remove this blank line.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table
+    runStatementOnDriver("create table Tstage (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //this creates an ORC data file with correct schema under table root
+    runStatementOnDriver("insert into Tstage values(1,2),(3,4),(5,6)");
+    final int[][] rows = { { 3 } };
+    //now we have an archive with 3 partitions

Review comment:
       The comment seems incorrect. This is non-partitioned table.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {
+
+    runStatementOnDriver("drop table if exists T");
+    runStatementOnDriver("drop table if exists Tstage");
+    runStatementOnDriver("create table T (a int, b int) stored"
+        + " as orc tblproperties('transactional'='true')");
+    //Tstage is the target table

Review comment:
       Comment incorrect. Tstage is source table.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -566,4 +581,33 @@ public void testMMExportAborted() throws Exception {
         TestTxnCommands2.stringifyValues(data), rs);
 
   }
-}
+
+
+  @Test public void testImportOrc() throws Exception {

Review comment:
       Can we also check if import sets stats to true in case of autostats_gather is set to true? For both partitioned and non-partitioned tables.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java
##########
@@ -404,6 +404,21 @@ public void testImportPartitionedOrc() throws Exception {
 
     //load T
     runStatementOnDriver("import table T from '" + getWarehouseDir() + "/1'");
+
+    //check basic stats in tblproperties
+    List<String> rsProperties = runStatementOnDriver("show tblproperties T");

Review comment:
       Can we check if stats are set in source table Tstage and their partitions? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org