You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/09/28 12:55:31 UTC

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #222: Parquet-313: Implement 3 level list writing rule for Parquet-Thrift

gszadovszky commented on a change in pull request #222:
URL: https://github.com/apache/parquet-mr/pull/222#discussion_r495908106



##########
File path: parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ParquetThriftBytesOutputFormat.java
##########
@@ -57,13 +58,33 @@ public ParquetThriftBytesOutputFormat() {
    *  when catching an exception the record can be discarded.
    *  The non-buffered implementation will stream field by field. Exceptions are unrecoverable and the file must be closed when an invalid record is written.
    *
+   * @param configuration configuration
    * @param protocolFactory the protocol factory to use to read the bytes
-   * @param thriftClass thriftClass the class to exctract the schema from
+   * @param thriftClass thriftClass the class to extract the schema from
    * @param buffered whether we should buffer each record
    * @param errorHandler handle record corruption and schema incompatible exception
    */
-  public ParquetThriftBytesOutputFormat(TProtocolFactory protocolFactory, Class<? extends TBase<?, ?>> thriftClass, boolean buffered, FieldIgnoredHandler errorHandler) {
-    super(new ThriftBytesWriteSupport(protocolFactory, thriftClass, buffered, errorHandler));
+  public ParquetThriftBytesOutputFormat(Configuration configuration,
+                                        TProtocolFactory protocolFactory,
+                                        Class<? extends TBase<?, ?>> thriftClass,
+                                        boolean buffered,
+                                        FieldIgnoredHandler errorHandler) {
+    super(new ThriftBytesWriteSupport(
+        configuration, protocolFactory, thriftClass, buffered, errorHandler));
+  }
+
+  @Deprecated
+  /**
+   * @deprecated Use @link{ParquetThriftBytesOutputFormat(
+   * Configuration configuration, TProtocolFactory protocolFactory,
+   * Class<? extends TBase<?, ?>> thriftClass, boolean buffered,
+   * FieldIgnoredHandler errorHandler)} instead
+   */
+  public ParquetThriftBytesOutputFormat(TProtocolFactory protocolFactory,

Review comment:
       nit:
   ```suggestion
     /**
      * Configuration configuration, TProtocolFactory protocolFactory,
      * Class<? extends TBase<?, ?>> thriftClass, boolean buffered,
      * FieldIgnoredHandler errorHandler)} instead
      * @deprecated Use @link{ParquetThriftBytesOutputFormat(
      */
     @Deprecated
     public ParquetThriftBytesOutputFormat(TProtocolFactory protocolFactory,
   ```

##########
File path: parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftBytesWriteSupport.java
##########
@@ -79,14 +79,34 @@
   private StructType thriftStruct;
   private ParquetWriteProtocol parquetWriteProtocol;
   private final FieldIgnoredHandler errorHandler;
+  private Configuration configuration;
 
   public ThriftBytesWriteSupport() {
     this.buffered = true;
     this.errorHandler = null;
   }
 
-  public ThriftBytesWriteSupport(TProtocolFactory protocolFactory, Class<? extends TBase<?, ?>> thriftClass, boolean buffered, FieldIgnoredHandler errorHandler) {
+  @Deprecated
+  /**
+   * @deprecated Use @link{ThriftBytesWriteSupport(Configuration configuration,
+   * TProtocolFactory protocolFactory, Class<? extends TBase<?, ?>> thriftClass,
+   * boolean buffered, FieldIgnoredHandler errorHandler)} instead
+   */
+  public ThriftBytesWriteSupport(TProtocolFactory protocolFactory,

Review comment:
       nit:
   ```suggestion
     /**
      * TProtocolFactory protocolFactory, Class<? extends TBase<?, ?>> thriftClass,
      * boolean buffered, FieldIgnoredHandler errorHandler)} instead
      * @deprecated Use @link{ThriftBytesWriteSupport(Configuration configuration,
      */
     @Deprecated
     public ThriftBytesWriteSupport(TProtocolFactory protocolFactory,
   ```

##########
File path: parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java
##########
@@ -76,22 +78,37 @@
   private final boolean doProjection;
   private final boolean keepOneOfEachUnion;
 
-  private ThriftSchemaConvertVisitor(FieldProjectionFilter fieldProjectionFilter, boolean doProjection, boolean keepOneOfEachUnion) {
+  private boolean writeThreeLevelList = ParquetWriteProtocol.WRITE_THREE_LEVEL_LISTS_DEFAULT;

Review comment:
       It seems this variable gets a value in the constructor anyway. I would suggest having it final and assign value in the constructor only once.

##########
File path: parquet-thrift/src/main/java/org/apache/parquet/thrift/ParquetWriteProtocol.java
##########
@@ -51,6 +52,11 @@
 
 public class ParquetWriteProtocol extends ParquetProtocol {
 
+  public static final String WRITE_THREE_LEVEL_LISTS = "parquet.thrift.write-three-level-lists";

Review comment:
       I know it is only about this change but we always struggles with our documentation. We have started to document the available configs in [parquet-hadoop](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md) and also in [parquet-avro](https://github.com/apache/parquet-mr/blob/master/parquet-avro/README.md). Maybe, it worth a similar documentation in parquet-thrift as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org