You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/31 11:57:27 UTC

[GitHub] [hive] lcspinter opened a new pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

lcspinter opened a new pull request #2333:
URL: https://github.com/apache/hive/pull/2333


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   Enhance table create syntax with support to partition transforms:
   
   `CREATE TABLE ... PARTITIONED BY SPEC(
   year_field year,
   month_field month,
   day_field day,
   hour_field hour,
   truncate_field truncate[3],
   bucket_field bucket[5],
   identity_field identity
   ) STORED BY ICEBERG;`
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   Previously we had support of partition transforms through `TBLPROPERTIES`, but that syntax is a bit cumbersome. For instance: 
   `CREATE EXTERNAL TABLE (id BIGINT, part_field DATE) default.part_test STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' TBLPROPERTIES ('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"part_field_year","transform":"year","source-id":2,"field-id":1000}]}', 'write.format.default'='AVRO', 'iceberg.catalog'='default_iceberg')`
   
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   Unit test, manual test
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter merged pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter merged pull request #2333:
URL: https://github.com/apache/hive/pull/2333


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r646497257



##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
##########
@@ -385,6 +385,7 @@ KW_DATACONNECTORS: 'CONNECTORS';
 KW_TYPE: 'TYPE';
 KW_URL: 'URL';
 KW_REMOTE: 'REMOTE';
+KW_SPEC: 'SPEC';

Review comment:
       No. I added `KW_SPEC` as a non-reserved keyword. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643139820



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
       }
     }
 
+    if (partitionTransformSpecExists) {
+      try {
+        HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler());

Review comment:
       Can we make this check before, at line 13409? That way we could skip parsing the partition spec unnecessarily first




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642960021



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
       Maybe a util class to get Icebeerg objects from the `SessionState`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643752741



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:
+        case HiveParser.TOK_YEAR:
+        case HiveParser.TOK_MONTH:
+        case HiveParser.TOK_DAY:
+        case HiveParser.TOK_HOUR:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          break;
+        case HiveParser.TOK_TRUNCATE:
+        case HiveParser.TOK_BUCKET:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText());
+          break;
+        default:
+          spec.name = grandChild.getText();
+        }
+      }
+      partSpecList.add(spec);
+    }
+
+    return partSpecList;
+  }
+
+  public enum TransformTypes {
+    IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET
+  }
+
+  public static class PartitionTransformSpec {
+    public String name;
+    public TransformTypes transformType;
+    public int transformParam;

Review comment:
       In the case of TRUNCATE and BUCKET, the existence of the transform param is validated during the query parsing, but I agree It can be misleading for transform types with additional params. 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on pull request #2333:
URL: https://github.com/apache/hive/pull/2333#issuecomment-851846758


   @jcamachor @zabetak @marton-bod @pvary @szlta Could you please review this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643008410



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
       + 1
   I'm working with the QueryState too on a different jira, and we would definitely benefit from some util methods to simplify these operations.

##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
        +1
   I'm working with the QueryState too on a different jira, and we would definitely benefit from some util methods to simplify these operations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643759841



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
       }
     }
 
+    if (partitionTransformSpecExists) {
+      try {
+        HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler());

Review comment:
       Ah I see. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643747143



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";

Review comment:
       Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643208790



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
       I'll create a new `IcebergSessionUtil` for providing some util methods, we can sync up with @lcspinter on the implementation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642996447



##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -3010,3 +3010,5 @@ const string TABLE_BUCKETING_VERSION = "bucketing_version",
 const string DRUID_CONFIG_PREFIX = "druid.",
 const string JDBC_CONFIG_PREFIX = "hive.sql.",
 const string TABLE_IS_CTAS = "created_with_ctas",
+const string PARTITION_TRANSFER_SPEC = "partition_transfer_spec",

Review comment:
       how come this is partition_transfer_spec, unlike elsewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643135880



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:
+        case HiveParser.TOK_YEAR:
+        case HiveParser.TOK_MONTH:
+        case HiveParser.TOK_DAY:
+        case HiveParser.TOK_HOUR:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          break;
+        case HiveParser.TOK_TRUNCATE:
+        case HiveParser.TOK_BUCKET:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText());
+          break;
+        default:
+          spec.name = grandChild.getText();
+        }
+      }
+      partSpecList.add(spec);
+    }
+
+    return partSpecList;
+  }
+
+  public enum TransformTypes {
+    IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET
+  }
+
+  public static class PartitionTransformSpec {
+    public String name;
+    public TransformTypes transformType;
+    public int transformParam;

Review comment:
       Should we make this Integer or Optional<Integer>? This would have a 0 value by default for those spec types too that don't support any params, which can be misleading. Although granted, 0 doesn't make much sense for either bucketing or truncate, but maybe semantically it would be a bit better to make this nullable/empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642966076



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";
+
+    TableIdentifier identifier = TableIdentifier.of("default", tableName);
+    shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+        " PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " +
+        "truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" +
+        " STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +
+        testTables.locationForCreateTableSQL(identifier) +
+        "TBLPROPERTIES ('" + InputFormatConfig.TABLE_SCHEMA + "'='" +
+        SchemaParser.toJson(schema) + "', " +
+        "'" + InputFormatConfig.CATALOG_NAME + "'='" + Catalogs.ICEBERG_DEFAULT_CATALOG_NAME + "')");

Review comment:
       Do we need this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643137193



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:

Review comment:
       nit: indentation missing after switch clause




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2333:
URL: https://github.com/apache/hive/pull/2333#issuecomment-852158270


   Looks great @lcspinter! Just a few questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643744591



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:
+        case HiveParser.TOK_YEAR:
+        case HiveParser.TOK_MONTH:
+        case HiveParser.TOK_DAY:
+        case HiveParser.TOK_HOUR:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          break;
+        case HiveParser.TOK_TRUNCATE:
+        case HiveParser.TOK_BUCKET:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText());
+          break;
+        default:
+          spec.name = grandChild.getText();
+        }
+      }
+      partSpecList.add(spec);
+    }
+
+    return partSpecList;
+  }
+
+  public enum TransformTypes {

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] szlta commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
szlta commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642953396



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:
+        case HiveParser.TOK_YEAR:
+        case HiveParser.TOK_MONTH:
+        case HiveParser.TOK_DAY:
+        case HiveParser.TOK_HOUR:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          break;
+        case HiveParser.TOK_TRUNCATE:
+        case HiveParser.TOK_BUCKET:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText());
+          break;
+        default:
+          spec.name = grandChild.getText();
+        }
+      }
+      partSpecList.add(spec);
+    }
+
+    return partSpecList;
+  }
+
+  public enum TransformTypes {

Review comment:
       I think it would be better to use the singular format: TransformType




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r646485008



##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
##########
@@ -385,6 +385,7 @@ KW_DATACONNECTORS: 'CONNECTORS';
 KW_TYPE: 'TYPE';
 KW_URL: 'URL';
 KW_REMOTE: 'REMOTE';
+KW_SPEC: 'SPEC';

Review comment:
       Would this mean that columns called `spec` could only be used from now on using backticks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642964505



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")

Review comment:
       nit: newline before, and I think it is more readable if we break after every parameter, like:
   ```
   PartitionSpec.builderFor(schema)
   .year("year_field")
   .month("month_field")
   .day("day_field")
   .hour("hour_field")
   .truncate("truncate_field", 2)
   .bucket("bucket_field", 2)
   .identity("identity_field")
   .build();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r646521390



##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
##########
@@ -385,6 +385,7 @@ KW_DATACONNECTORS: 'CONNECTORS';
 KW_TYPE: 'TYPE';
 KW_URL: 'URL';
 KW_REMOTE: 'REMOTE';
+KW_SPEC: 'SPEC';

Review comment:
       Great, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643746537



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
       I agree. I created an util class

##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -402,10 +406,18 @@ private Schema schema(Properties properties, org.apache.hadoop.hive.metastore.ap
     }
   }
 
-  private static PartitionSpec spec(Schema schema, Properties properties,
+  private static PartitionSpec spec(Configuration configuration, Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    if (hmsTable.getParameters().get(InputFormatConfig.PARTITION_SPEC) != null) {
+    if (SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname))

Review comment:
       I agree. I created a util class

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";
+
+    TableIdentifier identifier = TableIdentifier.of("default", tableName);
+    shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+        " PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " +
+        "truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" +
+        " STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +

Review comment:
       Fixed

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643139820



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
       }
     }
 
+    if (partitionTransformSpecExists) {
+      try {
+        HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler());

Review comment:
       Can we make this check before, at line 13409? That way we could skip parsing the partition spec unnecessarily first and wouldn't need the boolean




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643747277



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";
+
+    TableIdentifier identifier = TableIdentifier.of("default", tableName);
+    shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+        " PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " +
+        "truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" +
+        " STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +
+        testTables.locationForCreateTableSQL(identifier) +
+        "TBLPROPERTIES ('" + InputFormatConfig.TABLE_SCHEMA + "'='" +
+        SchemaParser.toJson(schema) + "', " +
+        "'" + InputFormatConfig.CATALOG_NAME + "'='" + Catalogs.ICEBERG_DEFAULT_CATALOG_NAME + "')");

Review comment:
       Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642965021



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";

Review comment:
       Do we need this, or it is enough to keep the `identifier` only?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643758291



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13455,6 +13469,18 @@ ASTNode analyzeCreateTable(
       }
     }
 
+    if (partitionTransformSpecExists) {
+      try {
+        HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler());

Review comment:
       We need to parse all the tokens in the AST Tree before we can make any validation.
   If we have a query like this 
   `CREATE EXTERNAL TABLE ... PARTITIONED BY SPEC (....) STORED BY ICEBERG`, the tokens are processed in order.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643135880



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionTransform.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class PartitionTransform {
+
+  private static final Map<Integer, TransformTypes> TRANSFORMS = Stream
+      .of(new Object[][] { { HiveParser.TOK_IDENTITY, TransformTypes.IDENTITY },
+          { HiveParser.TOK_YEAR, TransformTypes.YEAR }, { HiveParser.TOK_MONTH, TransformTypes.MONTH },
+          { HiveParser.TOK_DAY, TransformTypes.DAY }, { HiveParser.TOK_HOUR, TransformTypes.HOUR },
+          { HiveParser.TOK_TRUNCATE, TransformTypes.TRUNCATE }, { HiveParser.TOK_BUCKET, TransformTypes.BUCKET } })
+      .collect(Collectors.toMap(e -> (Integer) e[0], e -> (TransformTypes) e[1]));
+
+  /**
+   * Parse the partition transform specifications from the AST Tree node.
+   * @param node AST Tree node, must be not null
+   * @return list of partition transforms
+   */
+  public static List<PartitionTransformSpec> getPartitionTransformSpec(ASTNode node) {
+    List<PartitionTransformSpec> partSpecList = new ArrayList<>();
+    for (int i = 0; i < node.getChildCount(); i++) {
+      PartitionTransformSpec spec = new PartitionTransformSpec();
+      ASTNode child = (ASTNode) node.getChild(i);
+      for (int j = 0; j < child.getChildCount(); j++) {
+        ASTNode grandChild = (ASTNode) child.getChild(j);
+        switch (grandChild.getToken().getType()) {
+        case HiveParser.TOK_IDENTITY:
+        case HiveParser.TOK_YEAR:
+        case HiveParser.TOK_MONTH:
+        case HiveParser.TOK_DAY:
+        case HiveParser.TOK_HOUR:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          break;
+        case HiveParser.TOK_TRUNCATE:
+        case HiveParser.TOK_BUCKET:
+          spec.transformType = TRANSFORMS.get(grandChild.getToken().getType());
+          spec.transformParam = Integer.valueOf(grandChild.getChild(0).getText());
+          break;
+        default:
+          spec.name = grandChild.getText();
+        }
+      }
+      partSpecList.add(spec);
+    }
+
+    return partSpecList;
+  }
+
+  public enum TransformTypes {
+    IDENTITY, YEAR, MONTH, DAY, HOUR, TRUNCATE, BUCKET
+  }
+
+  public static class PartitionTransformSpec {
+    public String name;
+    public TransformTypes transformType;
+    public int transformParam;

Review comment:
       Should we make this Integer or Optional< Integer >? This would have a 0 value by default for those spec types too that don't support any params, which can be misleading. Although granted, 0 doesn't make much sense for either bucketing or truncate, but maybe semantically it would be a bit better to make this nullable/empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r642963610



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -155,6 +155,36 @@ public void after() throws Exception {
     HiveIcebergStorageHandlerTestUtils.close(shell);
   }
 
+  @Test
+  public void testPartitionTransform() {
+    Schema schema = new Schema(
+        optional(1, "id", Types.LongType.get()),
+        optional(2, "year_field", Types.DateType.get()),
+        optional(3, "month_field", Types.TimestampType.withZone()),
+        optional(4, "day_field", Types.TimestampType.withoutZone()),
+        optional(5, "hour_field", Types.TimestampType.withoutZone()),
+        optional(6, "truncate_field", Types.StringType.get()),
+        optional(7, "bucket_field", Types.StringType.get()),
+        optional(8, "identity_field", Types.StringType.get())
+    );
+    PartitionSpec spec = PartitionSpec.builderFor(schema).year("year_field").month("month_field").day("day_field")
+        .hour("hour_field").truncate("truncate_field", 2).bucket("bucket_field", 2)
+        .identity("identity_field").build();
+    String tableName = "part_test";
+
+    TableIdentifier identifier = TableIdentifier.of("default", tableName);
+    shell.executeStatement("CREATE EXTERNAL TABLE " + identifier +
+        " PARTITIONED BY SPEC (year_field year, month_field month, day_field day, hour_field hour, " +
+        "truncate_field truncate[2], bucket_field bucket[2], identity_field identity)" +
+        " STORED BY '" + HiveIcebergStorageHandler.class.getName() + "' " +

Review comment:
       We can use `STORED BY ICEBERG`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r643749820



##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -3010,3 +3010,5 @@ const string TABLE_BUCKETING_VERSION = "bucketing_version",
 const string DRUID_CONFIG_PREFIX = "druid.",
 const string JDBC_CONFIG_PREFIX = "hive.sql.",
 const string TABLE_IS_CTAS = "created_with_ctas",
+const string PARTITION_TRANSFER_SPEC = "partition_transfer_spec",

Review comment:
       I don't know how this happened. Probably I did this typo when I was switching between the IDE and the text editor. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2333: HIVE-25179: Support all partition transforms for Iceberg in create table

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2333:
URL: https://github.com/apache/hive/pull/2333#discussion_r646538065



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java
##########
@@ -46,27 +50,60 @@ private IcebergTableUtil() {
    * @return an Iceberg table
    */
   static Table getTable(Configuration configuration, Properties properties) {
-    Table table = null;
-    QueryState queryState = null;
     String tableIdentifier = properties.getProperty(Catalogs.NAME);
-    if (SessionState.get() != null) {
-      queryState = SessionState.get().getQueryState(configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname));
-      if (queryState != null) {
-        table = (Table) queryState.getResource(tableIdentifier);
-      } else {
-        LOG.debug("QueryState is not available in SessionState. Loading {} from configured catalog.", tableIdentifier);
-      }
-    } else {
-      LOG.debug("SessionState is not available. Loading {} from configured catalog.", tableIdentifier);
-    }
+    return SessionStateUtil.getResource(configuration, tableIdentifier).filter(o -> o instanceof Table)
+        .map(o -> (Table) o).orElseGet(() -> {
+          LOG.debug("Iceberg table {} is not found in QueryState. Loading table from configured catalog",
+              tableIdentifier);
+          Table tab = Catalogs.loadTable(configuration, properties);
+          SessionStateUtil.addResource(configuration, tableIdentifier, tab);
+          return tab;
+        });
+  }
 
-    if (table == null) {
-      table = Catalogs.loadTable(configuration, properties);
-      if (queryState != null) {
-        queryState.addResource(tableIdentifier, table);
-      }
+  /**
+   * Create {@link PartitionSpec} based on the partition information stored in
+   * {@link org.apache.hadoop.hive.ql.parse.PartitionTransform.PartitionTransformSpec}.
+   * @param configuration a Hadoop configuration
+   * @param schema iceberg table schema
+   * @return iceberg partition spec, always non-null
+   */
+  public static PartitionSpec spec(Configuration configuration, Schema schema) {
+    List<PartitionTransform.PartitionTransformSpec> partitionTransformSpecList = SessionStateUtil
+            .getResource(configuration, hive_metastoreConstants.PARTITION_TRANSFORM_SPEC)
+        .map(o -> (List<PartitionTransform.PartitionTransformSpec>) o).orElseGet(() -> null);
+
+    if (partitionTransformSpecList == null) {
+      LOG.debug("Iceberg partition transform spec is not found in QueryState.");

Review comment:
       I think this log line will be printed every time when we create or alter a table and there is no partition spec.
   This could be inconvenient




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org