You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/03 22:09:14 UTC

[GitHub] [incubator-iceberg] rdblue opened a new pull request #891: Parquet: Use new logical type annotations

rdblue opened a new pull request #891: Parquet: Use new logical type annotations
URL: https://github.com/apache/incubator-iceberg/pull/891
 
 
   Parquet 1.11.0 added logical type annotations that can be used to store Iceberg's `adjustToUTC` value like Avro, so that Parquet files correctly encode whether a value is a `timestamp` or a `timestamptz`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #891: Parquet: Use new logical type annotations
URL: https://github.com/apache/incubator-iceberg/pull/891#issuecomment-608710275
 
 
   @Fokko, you may be interested in this one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#discussion_r415405947



##########
File path: data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java
##########
@@ -117,6 +120,13 @@ private GenericParquetWriter() {
     @Override
     public ParquetValueWriter<?> primitive(PrimitiveType primitive) {
       ColumnDescriptor desc = type.getColumnDescription(currentPath());
+      LogicalTypeAnnotation logicalType = primitive.getLogicalTypeAnnotation();
+      if (logicalType != null) {
+        Optional<PrimitiveWriter<?>> writer = logicalType.accept(new LogicalTypeWriterVisitor(desc));
+        if (writer.isPresent()) {
+          return writer.get();
+        }

Review comment:
       Correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] waterlx commented on a change in pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
waterlx commented on a change in pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#discussion_r415244991



##########
File path: data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java
##########
@@ -117,6 +120,13 @@ private GenericParquetWriter() {
     @Override
     public ParquetValueWriter<?> primitive(PrimitiveType primitive) {
       ColumnDescriptor desc = type.getColumnDescription(currentPath());
+      LogicalTypeAnnotation logicalType = primitive.getLogicalTypeAnnotation();
+      if (logicalType != null) {
+        Optional<PrimitiveWriter<?>> writer = logicalType.accept(new LogicalTypeWriterVisitor(desc));
+        if (writer.isPresent()) {
+          return writer.get();
+        }

Review comment:
       When writer.isPresent() == false, we falls back to the previous logic, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#discussion_r413177940



##########
File path: data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java
##########
@@ -200,6 +210,70 @@ private GenericParquetWriter() {
     }
   }
 
+  private static class LogicalTypeWriterVisitor implements LogicalTypeAnnotationVisitor<PrimitiveWriter<?>> {
+    private final ColumnDescriptor desc;
+
+    private LogicalTypeWriterVisitor(ColumnDescriptor desc) {
+      this.desc = desc;
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.StringLogicalTypeAnnotation stringType) {
+      return Optional.of(ParquetValueWriters.strings(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumType) {
+      return Optional.of(ParquetValueWriters.strings(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.DecimalLogicalTypeAnnotation decimalType) {
+      switch (desc.getPrimitiveType().getPrimitiveTypeName()) {
+        case INT32:
+          return Optional.of(ParquetValueWriters.decimalAsInteger(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+        case INT64:
+          return Optional.of(ParquetValueWriters.decimalAsLong(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+        case BINARY:
+        case FIXED_LEN_BYTE_ARRAY:
+          return Optional.of(ParquetValueWriters.decimalAsFixed(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+      }
+      return Optional.empty();
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.DateLogicalTypeAnnotation dateType) {
+      return Optional.of(new DateWriter(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeType) {
+      return Optional.of(new TimeWriter(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.TimestampLogicalTypeAnnotation timestampType) {
+      if (timestampType.isAdjustedToUTC()) {
+        return Optional.of(new TimestamptzWriter(desc));
+      } else {
+        return Optional.of(new TimestampWriter(desc));
+      }
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.JsonLogicalTypeAnnotation jsonLogicalType) {

Review comment:
       Methods that aren't implemented here use the superclass implementation, which returns `Optional.empty`. When that happens, we fall back to the old logic that uses just the primitive type.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#issuecomment-625467381


   Thanks for reviewing, everyone!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] wangmiao1981 commented on a change in pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
wangmiao1981 commented on a change in pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#discussion_r412674442



##########
File path: data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java
##########
@@ -200,6 +210,70 @@ private GenericParquetWriter() {
     }
   }
 
+  private static class LogicalTypeWriterVisitor implements LogicalTypeAnnotationVisitor<PrimitiveWriter<?>> {
+    private final ColumnDescriptor desc;
+
+    private LogicalTypeWriterVisitor(ColumnDescriptor desc) {
+      this.desc = desc;
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.StringLogicalTypeAnnotation stringType) {
+      return Optional.of(ParquetValueWriters.strings(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumType) {
+      return Optional.of(ParquetValueWriters.strings(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.DecimalLogicalTypeAnnotation decimalType) {
+      switch (desc.getPrimitiveType().getPrimitiveTypeName()) {
+        case INT32:
+          return Optional.of(ParquetValueWriters.decimalAsInteger(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+        case INT64:
+          return Optional.of(ParquetValueWriters.decimalAsLong(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+        case BINARY:
+        case FIXED_LEN_BYTE_ARRAY:
+          return Optional.of(ParquetValueWriters.decimalAsFixed(
+              desc, decimalType.getPrecision(), decimalType.getScale()));
+      }
+      return Optional.empty();
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.DateLogicalTypeAnnotation dateType) {
+      return Optional.of(new DateWriter(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeType) {
+      return Optional.of(new TimeWriter(desc));
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.TimestampLogicalTypeAnnotation timestampType) {
+      if (timestampType.isAdjustedToUTC()) {
+        return Optional.of(new TimestamptzWriter(desc));
+      } else {
+        return Optional.of(new TimestampWriter(desc));
+      }
+    }
+
+    @Override
+    public Optional<PrimitiveWriter<?>> visit(LogicalTypeAnnotation.JsonLogicalTypeAnnotation jsonLogicalType) {

Review comment:
       In Parquet, LogicalTypeAnnotation.java, there is `LogicalTypeAnnotation.IntLogicalTypeAnnotation`. Do we need it here too? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] danielcweeks commented on pull request #891: Parquet: Use new logical type annotations

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #891:
URL: https://github.com/apache/incubator-iceberg/pull/891#issuecomment-624942454


   This all looks good to me. +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org