You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/08/17 10:03:19 UTC

[GitHub] [hive] adesh-rao opened a new pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

adesh-rao opened a new pull request #1050:
URL: https://github.com/apache/hive/pull/1050


   
   
   Reference for converting hive types to java
   Hive datatypes: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
   Java Datatypes: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1050:
URL: https://github.com/apache/hive/pull/1050#issuecomment-667747595


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1050:
URL: https://github.com/apache/hive/pull/1050#discussion_r465626819



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -1493,6 +1497,39 @@ public static String getPartitionName(Path tablePath, Path partitionPath, Set<St
     return result;
   }
 
+  public static String getNormalisedPartitionValue(String partitionValue, String type) {
+
+    LOG.debug("Converting '" + partitionValue + "' to type: '" + type + "'.");
+
+    if (type.equalsIgnoreCase("tinyint")
+    || type.equalsIgnoreCase("smallint")
+    || type.equalsIgnoreCase("int")){
+      return Integer.toString(Integer.parseInt(partitionValue));
+    } else if (type.equalsIgnoreCase("bigint")){
+      return Long.toString(Long.parseLong(partitionValue));
+    } else if (type.equalsIgnoreCase("float")){
+      return Float.toString(Float.parseFloat(partitionValue));
+    } else if (type.equalsIgnoreCase("double")){
+      return Double.toString(Double.parseDouble(partitionValue));
+    } else if (type.startsWith("decimal")){
+      // Decimal datatypes are stored like decimal(10,10)
+      return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString();
+    }
+    return partitionValue;

Review comment:
       As a follow-up, do we need to validate date, timestamp values before add partitions? Are we checking it during inserts or add partition command?

##########
File path: ql/src/test/queries/clientpositive/msck_repair_7.q
##########
@@ -0,0 +1,63 @@
+DROP TABLE IF EXISTS repairtable_n7_1;
+DROP TABLE IF EXISTS repairtable_n7_2;
+DROP TABLE IF EXISTS repairtable_n7_3;
+DROP TABLE IF EXISTS repairtable_n7_4;
+DROP TABLE IF EXISTS repairtable_n7_5;
+DROP TABLE IF EXISTS repairtable_n7_6;
+DROP TABLE IF EXISTS repairtable_n7_7;
+
+CREATE EXTERNAL TABLE repairtable_n7_1(key INT) PARTITIONED BY (p1 TINYINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_1/';
+CREATE EXTERNAL TABLE repairtable_n7_2(key INT) PARTITIONED BY (p1 SMALLINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_2/';
+CREATE EXTERNAL TABLE repairtable_n7_3(key INT) PARTITIONED BY (p1 INT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_3/';
+CREATE EXTERNAL TABLE repairtable_n7_4(key INT) PARTITIONED BY (p1 BIGINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_4/';
+CREATE EXTERNAL TABLE repairtable_n7_5(key INT) PARTITIONED BY (p1 FLOAT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_5/';
+CREATE EXTERNAL TABLE repairtable_n7_6(key INT) PARTITIONED BY (p1 DOUBLE) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_6/';
+CREATE EXTERNAL TABLE repairtable_n7_7(key INT) PARTITIONED BY (p1 DECIMAL(10,10)) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_7/';

Review comment:
       Have another table with STRING type column where the 0 shouldn't be truncated by MSCK repair.

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -1493,6 +1497,39 @@ public static String getPartitionName(Path tablePath, Path partitionPath, Set<St
     return result;
   }
 
+  public static String getNormalisedPartitionValue(String partitionValue, String type) {
+
+    LOG.debug("Converting '" + partitionValue + "' to type: '" + type + "'.");
+
+    if (type.equalsIgnoreCase("tinyint")
+    || type.equalsIgnoreCase("smallint")
+    || type.equalsIgnoreCase("int")){
+      return Integer.toString(Integer.parseInt(partitionValue));
+    } else if (type.equalsIgnoreCase("bigint")){
+      return Long.toString(Long.parseLong(partitionValue));
+    } else if (type.equalsIgnoreCase("float")){
+      return Float.toString(Float.parseFloat(partitionValue));
+    } else if (type.equalsIgnoreCase("double")){
+      return Double.toString(Double.parseDouble(partitionValue));
+    } else if (type.startsWith("decimal")){
+      // Decimal datatypes are stored like decimal(10,10)
+      return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString();
+    }
+    return partitionValue;
+  }
+
+  public static Map<String, String> getPartitionColtoTypeMap(List<FieldSchema> partitionCols) {
+    Map<String, String> typeMap = new HashMap<>();
+
+    if (!(partitionCols == null || partitionCols.isEmpty())) {

Review comment:
       Avoid outer negation and use if ((partitionCols != null) && !partitionCols.isEmpty()). Even isEmpty need not be checked here as below loop won't execute if it is empty.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1050:
URL: https://github.com/apache/hive/pull/1050#discussion_r467460705



##########
File path: ql/src/test/queries/clientpositive/msck_repair_7.q
##########
@@ -0,0 +1,71 @@
+DROP TABLE IF EXISTS repairtable_n7_1;
+DROP TABLE IF EXISTS repairtable_n7_2;
+DROP TABLE IF EXISTS repairtable_n7_3;
+DROP TABLE IF EXISTS repairtable_n7_4;
+DROP TABLE IF EXISTS repairtable_n7_5;
+DROP TABLE IF EXISTS repairtable_n7_6;
+DROP TABLE IF EXISTS repairtable_n7_7;
+DROP TABLE IF EXISTS repairtable_n7_8;
+
+CREATE EXTERNAL TABLE repairtable_n7_1(key INT) PARTITIONED BY (p1 TINYINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_1/';
+CREATE EXTERNAL TABLE repairtable_n7_2(key INT) PARTITIONED BY (p1 SMALLINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_2/';
+CREATE EXTERNAL TABLE repairtable_n7_3(key INT) PARTITIONED BY (p1 INT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_3/';
+CREATE EXTERNAL TABLE repairtable_n7_4(key INT) PARTITIONED BY (p1 BIGINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_4/';
+CREATE EXTERNAL TABLE repairtable_n7_5(key INT) PARTITIONED BY (p1 FLOAT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_5/';
+CREATE EXTERNAL TABLE repairtable_n7_6(key INT) PARTITIONED BY (p1 DOUBLE) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_6/';
+CREATE EXTERNAL TABLE repairtable_n7_7(key INT) PARTITIONED BY (p1 DECIMAL(10,10)) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_7/';
+CREATE EXTERNAL TABLE repairtable_n7_8(key INT) PARTITIONED BY (p1 string) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_8/';
+
+MSCK REPAIR TABLE repairtable_n7_1;
+MSCK REPAIR TABLE repairtable_n7_2;
+MSCK REPAIR TABLE repairtable_n7_3;
+MSCK REPAIR TABLE repairtable_n7_4;
+MSCK REPAIR TABLE repairtable_n7_5;
+MSCK REPAIR TABLE repairtable_n7_6;
+MSCK REPAIR TABLE repairtable_n7_7;
+MSCK REPAIR TABLE repairtable_n7_8;
+
+show partitions repairtable_n7_1;
+show partitions repairtable_n7_2;
+show partitions repairtable_n7_3;
+show partitions repairtable_n7_4;
+show partitions repairtable_n7_5;
+show partitions repairtable_n7_6;
+show partitions repairtable_n7_7;
+show partitions repairtable_n7_8;
+
+dfs  ${system:test.dfs.mkdir} -p ${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_1/p1=01;

Review comment:
       Under each numeric table, add another partition dir with same value but different number of preceding or following 0s. It should add only one partition and point to just one directory. Not sure, what is the right behavior in this case but should be well documented.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] adesh-rao commented on pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
adesh-rao commented on pull request #1050:
URL: https://github.com/apache/hive/pull/1050#issuecomment-638329732


   @sankarh Can you take a look at the PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #1050:
URL: https://github.com/apache/hive/pull/1050


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh commented on a change in pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
sankarh commented on a change in pull request #1050:
URL: https://github.com/apache/hive/pull/1050#discussion_r467460705



##########
File path: ql/src/test/queries/clientpositive/msck_repair_7.q
##########
@@ -0,0 +1,71 @@
+DROP TABLE IF EXISTS repairtable_n7_1;
+DROP TABLE IF EXISTS repairtable_n7_2;
+DROP TABLE IF EXISTS repairtable_n7_3;
+DROP TABLE IF EXISTS repairtable_n7_4;
+DROP TABLE IF EXISTS repairtable_n7_5;
+DROP TABLE IF EXISTS repairtable_n7_6;
+DROP TABLE IF EXISTS repairtable_n7_7;
+DROP TABLE IF EXISTS repairtable_n7_8;
+
+CREATE EXTERNAL TABLE repairtable_n7_1(key INT) PARTITIONED BY (p1 TINYINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_1/';
+CREATE EXTERNAL TABLE repairtable_n7_2(key INT) PARTITIONED BY (p1 SMALLINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_2/';
+CREATE EXTERNAL TABLE repairtable_n7_3(key INT) PARTITIONED BY (p1 INT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_3/';
+CREATE EXTERNAL TABLE repairtable_n7_4(key INT) PARTITIONED BY (p1 BIGINT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_4/';
+CREATE EXTERNAL TABLE repairtable_n7_5(key INT) PARTITIONED BY (p1 FLOAT) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_5/';
+CREATE EXTERNAL TABLE repairtable_n7_6(key INT) PARTITIONED BY (p1 DOUBLE) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_6/';
+CREATE EXTERNAL TABLE repairtable_n7_7(key INT) PARTITIONED BY (p1 DECIMAL(10,10)) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_7/';
+CREATE EXTERNAL TABLE repairtable_n7_8(key INT) PARTITIONED BY (p1 string) stored as ORC location '${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_8/';
+
+MSCK REPAIR TABLE repairtable_n7_1;
+MSCK REPAIR TABLE repairtable_n7_2;
+MSCK REPAIR TABLE repairtable_n7_3;
+MSCK REPAIR TABLE repairtable_n7_4;
+MSCK REPAIR TABLE repairtable_n7_5;
+MSCK REPAIR TABLE repairtable_n7_6;
+MSCK REPAIR TABLE repairtable_n7_7;
+MSCK REPAIR TABLE repairtable_n7_8;
+
+show partitions repairtable_n7_1;
+show partitions repairtable_n7_2;
+show partitions repairtable_n7_3;
+show partitions repairtable_n7_4;
+show partitions repairtable_n7_5;
+show partitions repairtable_n7_6;
+show partitions repairtable_n7_7;
+show partitions repairtable_n7_8;
+
+dfs  ${system:test.dfs.mkdir} -p ${system:test.tmp.dir}/apps/hive/warehouse/test.db/repairtable_n7_1/p1=01;

Review comment:
       Under few numeric column table, add another partition dir with same value but different number of preceding or following 0s. It should add only one partition and point to just one directory. Not sure, what is the right behavior in this case but should be well documented.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sankarh merged pull request #1050: HIVE-23358: MSCK Repair should remove insignificant 0's from numeric partition values

Posted by GitBox <gi...@apache.org>.
sankarh merged pull request #1050:
URL: https://github.com/apache/hive/pull/1050


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org