You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/11 14:23:19 UTC

[GitHub] [iceberg] pvary opened a new pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

pvary opened a new pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917


   After consulting with the field folks they convinced me that it would be beneficial to have the first version of conversion in place for creating partitioned Iceberg tables from Hive. They suggested that even in this limited form this feature can boost adoption by allowing to try out Iceberg tables with partitions without changing the actual SQL commands.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541776361



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandler.java
##########
@@ -666,45 +666,52 @@ public void testCreateTableWithColumnSpecification() throws IOException {
   }
 
   @Test
-  public void testCreateTableWithColumnSpecificationPartitioned() {
+  public void testCreateTableWithColumnSpecificationPartitioned() throws IOException {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    runPartitionedWriteTest(identifier, createSQLForPartitionedTableBySchema(testTables, identifier));
+  }
+
+  @Test
+  public void testCreatePartitionedTableByProperty() throws IOException {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    runPartitionedWriteTest(identifier, createSQLForPartitionedTableByProperty(testTables, identifier));
+  }
+
+  @Test
+  public void testCreatePartitionedTableWithPropertiesAndWithColumnSpecification() {
+    PartitionSpec spec = PartitionSpec.builderFor(CUSTOMER_SCHEMA).identity("last_name").build();
+
     AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
-        "currently not supported", () -> {
+        "Provide only one of the following", () -> {
           shell.executeStatement("CREATE EXTERNAL TABLE customers (customer_id BIGINT) " +
               "PARTITIONED BY (first_name STRING) " +
               "STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' " +
-              testTables.locationForCreateTableSQL(TableIdentifier.of("default", "customers")));
+              testTables.locationForCreateTableSQL(TableIdentifier.of("default", "customers")) +
+              " TBLPROPERTIES ('" + InputFormatConfig.PARTITION_SPEC + "'='" +
+              PartitionSpecParser.toJson(spec) + "')");
         }
     );
   }
 
   @Test
-  public void testCreatePartitionedTable() throws IOException {
+  public void testCreateTableWithColumnSpecificationMultilevelPartitioned() {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541776376



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(hmsTable.isSetPartitionKeys() || hmsTable.getPartitionKeys().isEmpty(),

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r544746099



##########
File path: build.gradle
##########
@@ -542,7 +542,7 @@ project(':iceberg-mr') {
 
   test {
     // testJoinTables / testScanTable
-    maxHeapSize '1500m'
+    maxHeapSize '2500m'

Review comment:
       Why was this needed? Additional tasks because of partitioning?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#issuecomment-747147630


   Thanks @marton-bod for reviewing, and @pvary for working on this!
   
   It looks good to me. I noted a few nits to fix, but I'm also fine merging this if you don't have time to fix them. I'll wait a day or two and then merge if I don't hear back.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] marton-bod commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

marton-bod commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541574994



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(hmsTable.isSetPartitionKeys() || hmsTable.getPartitionKeys().isEmpty(),

Review comment:
       shouldn't this be `!hmsTable.isSetPartitionKeys()`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r551856778



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       Theoretically it could be set to an empty list. Checked this way to keep on the safe side.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r544744608



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());

Review comment:
       Nit: generally prefer `Lists.newArrayList()` to specific class constructors.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] marton-bod commented on pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

marton-bod commented on pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#issuecomment-746072184


   @rdblue Could you please take a look at this as well, whenever suitable? :) Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541571405



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && hmsTable.getPartitionKeys().size() != 0) {

Review comment:
       Done

##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && hmsTable.getPartitionKeys().size() != 0) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r544745302



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(!hmsTable.isSetPartitionKeys() || hmsTable.getPartitionKeys().isEmpty(),
+          "Provide only one of the following: Hive partition specification, or the " +
+              InputFormatConfig.PARTITION_SPEC + " property");
       return PartitionSpecParser.fromJson(schema, properties.getProperty(InputFormatConfig.PARTITION_SPEC));
     } else {
-      return PartitionSpec.unpartitioned();
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       Nit: could be `else if` case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r544745083



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       When would `isSetPartitionKeys()` be true and `getPartitionKeys()` empty?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] marton-bod commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

marton-bod commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541570062



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && hmsTable.getPartitionKeys().size() != 0) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),

Review comment:
       nit: shall we check `hmsTable.isSetPartitionKeys()` here? in other places we seem to use that instead of the null check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r544744897



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       Nit: this could be an `else if` to remove a level of indentation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] marton-bod commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

marton-bod commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r541569540



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandler.java
##########
@@ -666,45 +666,52 @@ public void testCreateTableWithColumnSpecification() throws IOException {
   }
 
   @Test
-  public void testCreateTableWithColumnSpecificationPartitioned() {
+  public void testCreateTableWithColumnSpecificationPartitioned() throws IOException {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    runPartitionedWriteTest(identifier, createSQLForPartitionedTableBySchema(testTables, identifier));
+  }
+
+  @Test
+  public void testCreatePartitionedTableByProperty() throws IOException {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    runPartitionedWriteTest(identifier, createSQLForPartitionedTableByProperty(testTables, identifier));
+  }
+
+  @Test
+  public void testCreatePartitionedTableWithPropertiesAndWithColumnSpecification() {
+    PartitionSpec spec = PartitionSpec.builderFor(CUSTOMER_SCHEMA).identity("last_name").build();
+
     AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
-        "currently not supported", () -> {
+        "Provide only one of the following", () -> {
           shell.executeStatement("CREATE EXTERNAL TABLE customers (customer_id BIGINT) " +
               "PARTITIONED BY (first_name STRING) " +
               "STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' " +
-              testTables.locationForCreateTableSQL(TableIdentifier.of("default", "customers")));
+              testTables.locationForCreateTableSQL(TableIdentifier.of("default", "customers")) +
+              " TBLPROPERTIES ('" + InputFormatConfig.PARTITION_SPEC + "'='" +
+              PartitionSpecParser.toJson(spec) + "')");
         }
     );
   }
 
   @Test
-  public void testCreatePartitionedTable() throws IOException {
+  public void testCreateTableWithColumnSpecificationMultilevelPartitioned() {

Review comment:
       Can we make `runPartitionedWriteTest` generic enough to reuse it here too? Would be great to have an execution test for multi-level partition as well.

##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && hmsTable.getPartitionKeys().size() != 0) {

Review comment:
       nit: can use `!hmsTable.getPartitionKeys().isEmpty()`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r551857035



##########
File path: build.gradle
##########
@@ -542,7 +542,7 @@ project(':iceberg-mr') {
 
   test {
     // testJoinTables / testScanTable
-    maxHeapSize '1500m'
+    maxHeapSize '2500m'

Review comment:
       Yeah. The extra tasks eat more memory :(




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue merged pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r551856326



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());

Review comment:
       Fix in #2029

##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       Fix in #2029




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on a change in pull request #1917: Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions

Posted by GitBox <gi...@apache.org>.

pvary commented on a change in pull request #1917:
URL: https://github.com/apache/iceberg/pull/1917#discussion_r551856882



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
     if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
       return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
     } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
+        // Add partitioning columns to the original column list before creating the Iceberg Schema
+        List<FieldSchema> cols = new ArrayList<>(hmsTable.getSd().getCols());
+        cols.addAll(hmsTable.getPartitionKeys());
+        return HiveSchemaUtil.convert(cols);
+      } else {
+        return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      }
     }
   }
 
   private static PartitionSpec spec(Schema schema, Properties properties,
       org.apache.hadoop.hive.metastore.api.Table hmsTable) {
 
-    Preconditions.checkArgument(hmsTable.getPartitionKeys() == null || hmsTable.getPartitionKeys().isEmpty(),
-        "Partitioned Hive tables are currently not supported");
-
     if (properties.getProperty(InputFormatConfig.PARTITION_SPEC) != null) {
+      Preconditions.checkArgument(!hmsTable.isSetPartitionKeys() || hmsTable.getPartitionKeys().isEmpty(),
+          "Provide only one of the following: Hive partition specification, or the " +
+              InputFormatConfig.PARTITION_SPEC + " property");
       return PartitionSpecParser.fromJson(schema, properties.getProperty(InputFormatConfig.PARTITION_SPEC));
     } else {
-      return PartitionSpec.unpartitioned();
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {

Review comment:
       Fix in #2029




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org