You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/09 20:34:23 UTC

[GitHub] [incubator-iceberg] rdblue opened a new pull request #909: Parquet: Support constant map for partition values

rdblue opened a new pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909
 
 
   This is a follow-up to #896, which added the same constant map support for Avro.
   
   Fixes #575 for Parquet and replaces #585.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909#discussion_r407016734
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java
 ##########
 @@ -307,4 +308,72 @@ public void testPartitionValueTypes() throws Exception {
       TestTables.clearTables();
     }
   }
+
+  @Test
+  public void testNestedPartitionValues() throws Exception {
+    Assume.assumeTrue("ORC can't project nested partition values", !format.equalsIgnoreCase("orc"));
+
+    String[] columnNames = new String[] {
+        "b", "i", "l", "f", "d", "date", "ts", "s", "bytes", "dec_9_0", "dec_11_2", "dec_38_10"
+    };
+
+    HadoopTables tables = new HadoopTables(spark.sessionState().newHadoopConf());
+    Schema nestedSchema = new Schema(optional(1, "nested", SUPPORTED_PRIMITIVES.asStruct()));
+
+    // create a table around the source data
+    String sourceLocation = temp.newFolder("source_table").toString();
+    Table source = tables.create(nestedSchema, sourceLocation);
+
+    // write out an Avro data file with all of the data types for source data
+    List<GenericData.Record> expected = RandomData.generateList(source.schema(), 2, 128735L);
+    File avroData = temp.newFile("data.avro");
+    Assert.assertTrue(avroData.delete());
+    try (FileAppender<GenericData.Record> appender = Avro.write(Files.localOutput(avroData))
+        .schema(source.schema())
+        .build()) {
+      appender.addAll(expected);
+    }
+
+    // add the Avro data file to the source table
 
 Review comment:
   Why not write the data for the parameterized format for which the test is running?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909#issuecomment-612451598
 
 
   Thanks for reviewing @rdsr!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909#discussion_r406461875
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/avro/ValueReaders.java
 ##########
 @@ -597,10 +595,6 @@ protected StructReader(List<ValueReader<?>> readers, Types.StructType struct, Ma
 
     protected abstract void set(S struct, int pos, Object value);
 
-    protected Object prepareConstant(Type type, Object value) {
 
 Review comment:
   I'm moving this out of Avro and adding a callback to convert the constants to `PartitionUtil.constantsMap`. That way, Spark can supply a conversion function and use it in both places, instead of duplicating the conversion in Avro and Parquet readers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909#discussion_r406463743
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/spark/data/SparkValueReaders.java
 ##########
 @@ -287,30 +284,5 @@ protected void set(InternalRow struct, int pos, Object value) {
         struct.setNullAt(pos);
       }
     }
-
-    @Override
-    protected Object prepareConstant(Type type, Object value) {
 
 Review comment:
   Moved into Spark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909#discussion_r407079076
 
 

 ##########
 File path: spark/src/test/java/org/apache/iceberg/spark/source/TestPartitionValues.java
 ##########
 @@ -307,4 +308,72 @@ public void testPartitionValueTypes() throws Exception {
       TestTables.clearTables();
     }
   }
+
+  @Test
+  public void testNestedPartitionValues() throws Exception {
+    Assume.assumeTrue("ORC can't project nested partition values", !format.equalsIgnoreCase("orc"));
+
+    String[] columnNames = new String[] {
+        "b", "i", "l", "f", "d", "date", "ts", "s", "bytes", "dec_9_0", "dec_11_2", "dec_38_10"
+    };
+
+    HadoopTables tables = new HadoopTables(spark.sessionState().newHadoopConf());
+    Schema nestedSchema = new Schema(optional(1, "nested", SUPPORTED_PRIMITIVES.asStruct()));
+
+    // create a table around the source data
+    String sourceLocation = temp.newFolder("source_table").toString();
+    Table source = tables.create(nestedSchema, sourceLocation);
+
+    // write out an Avro data file with all of the data types for source data
+    List<GenericData.Record> expected = RandomData.generateList(source.schema(), 2, 128735L);
+    File avroData = temp.newFile("data.avro");
+    Assert.assertTrue(avroData.delete());
+    try (FileAppender<GenericData.Record> appender = Avro.write(Files.localOutput(avroData))
+        .schema(source.schema())
+        .build()) {
+      appender.addAll(expected);
+    }
+
+    // add the Avro data file to the source table
 
 Review comment:
   This is just source data for the write from Spark with the target format. Since it isn't part of the test, we don't want it to change at all in ways that might affect the test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue merged pull request #909: Parquet: Support constant map for partition values

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #909: Parquet: Support constant map for partition values
URL: https://github.com/apache/incubator-iceberg/pull/909
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org