You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/29 01:08:31 UTC

[GitHub] [iceberg] dmgcodevil opened a new pull request #2001: iss6369 a special close flag added to indicate that the writer is closed

dmgcodevil opened a new pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551623386



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -170,6 +181,9 @@ private void flushRowGroup(boolean finished) {
   }
 
   private void startRowGroup() {
+    if (this.closed) {
+      throw new IllegalStateException("writer is closed");

Review comment:
       Error messages should begin with a capital letter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r552117995



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -170,6 +181,10 @@ private void flushRowGroup(boolean finished) {
   }
 
   private void startRowGroup() {
+    if (this.closed) {
+      throw new IllegalStateException("Writer is closed");
+    }

Review comment:
       We would normally use `Preconditions.checkState` for this, but this is minor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r552117580



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -124,10 +125,20 @@ public Metrics metrics() {
     return ParquetUtil.footerMetrics(writer.getFooter(), model.metrics(), metricsConfig);
   }
 
+  /**
+   * Returns the approximate length of the output file produced by this writer.
+   * <p>
+   * Prior to calling {@link ParquetWriter#close}, the result is approximate. After calling close, the length is
+   * exact.

Review comment:
       This needs a `@return` as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551625141



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -124,10 +125,20 @@ public Metrics metrics() {
     return ParquetUtil.footerMetrics(writer.getFooter(), model.metrics(), metricsConfig);
   }
 
+  /**
+   * Returns the number of bytes written by this writer.
+   * NOTE: call {@link ParquetWriter#close()})} prior this method to get the exact number of bytes.

Review comment:
       This is going to be wrapped into a single paragraph. Also, it is okay to describe the behavior without `NOTE`.
   
   How about this?
   
   ```java
     /**
      * Returns the approximate length of the output file produced by this writer.
      * <p>
      * Prior to calling {@link ParquetWriter#close}, the result is approximate. After calling close, the length is exact.
      */
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#issuecomment-752181783


   Thanks for fixing this, @dmgcodevil! Looks close, just a few things to update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551521695



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -192,5 +203,6 @@ public void close() throws IOException {
     flushRowGroup(true);
     writeStore.close();
     writer.end(metadata);
+    this.closed = true;

Review comment:
       I'd also add :
   
   ```java
   else {
     throw new IllegalStateException("writer is closed");
   }
   ```
   
   but note sure if it aligns with Iceberg's coding style 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551624274



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -124,10 +125,20 @@ public Metrics metrics() {
     return ParquetUtil.footerMetrics(writer.getFooter(), model.metrics(), metricsConfig);
   }
 
+  /**
+   * Returns the number of bytes written by this writer.
+   * NOTE: call {@link ParquetWriter#close()})} prior this method to get the exact number of bytes.
+   *
+   * @return the number of bytes written by this writer.

Review comment:
       This isn't quite accurate. It is the number of bytes written only after the writer is closed. How about `@return length of the file produced by this writer, approximate until after closed`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r549792911



##########
File path: parquet/src/test/java/org/apache/iceberg/parquet/TestParquet.java
##########
@@ -82,15 +114,17 @@ private File generateFileWithTwoRowGroups(Function<MessageType, ParquetValueWrit
       records.add(record);
     }
 
-    // Force multiple row groups by making the byte size very small
+    // Force multiple row groups by making the byte size vParquetWriteAdapterery small

Review comment:
       Looks like a typo.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551631108



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -124,10 +125,20 @@ public Metrics metrics() {
     return ParquetUtil.footerMetrics(writer.getFooter(), model.metrics(), metricsConfig);
   }
 
+  /**
+   * Returns the number of bytes written by this writer.
+   * NOTE: call {@link ParquetWriter#close()})} prior this method to get the exact number of bytes.

Review comment:
       sounds good




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#issuecomment-754931425


   Thanks @dmgcodevil! I merged this. It should be in the 0.11.0 release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r549793691



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -192,5 +203,6 @@ public void close() throws IOException {
     flushRowGroup(true);
     writeStore.close();
     writer.end(metadata);
+    this.closed = true;

Review comment:
       If this is going to track whether the writer is closed, then it should make the close idempotent by checking the flag:
   
   ```java
     if (!closed) {
       this.closed = true;
       flushRowGroup(true);
       writeStore.close();
       writer.end(metadata);
     }
   ```
   
   I would also add a precondition in `startRowGroup` that checks that `closed` is false.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r549792717



##########
File path: parquet/src/test/java/org/apache/iceberg/parquet/ParquetWritingTestUtils.java
##########
@@ -53,20 +54,43 @@ static File writeRecords(TemporaryFolder temp, Schema schema,
   }
 
   static File writeRecords(
-      TemporaryFolder temp,
-      Schema schema, Map<String, String> properties,
-      Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
-      GenericData.Record... records) throws IOException {
+          TemporaryFolder temp,
+          Schema schema, Map<String, String> properties,
+          Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
+          GenericData.Record... records) throws IOException {
+    File file = createTempFile(temp);
+    write(file, schema, properties, createWriterFunc, records);
+    return file;
+  }
+
+  static long write(File file, Schema schema, Map<String, String> properties,
+                    Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
+                    GenericData.Record... records) throws IOException {
+    FileAppender<GenericData.Record> writer = Parquet.write(localOutput(file))

Review comment:
       This should still use try-with-resources. You just have to keep a reference to the writer outside of it:
   
   ```java
     FileAppender<Record> writer = Parquet.write(...);
     try (Closeable toClose = writer) {
       writer.addAll(records);
     }
   
     writer.length();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r552117769



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -124,10 +125,20 @@ public Metrics metrics() {
     return ParquetUtil.footerMetrics(writer.getFooter(), model.metrics(), metricsConfig);
   }
 
+  /**
+   * Returns the approximate length of the output file produced by this writer.
+   * <p>
+   * Prior to calling {@link ParquetWriter#close}, the result is approximate. After calling close, the length is
+   * exact.
+   */
   @Override
   public long length() {
     try {
-      return writer.getPos() + (writeStore.isColumnFlushNeeded() ? writeStore.getBufferedSize() : 0);
+      if (this.closed) {

Review comment:
       Nit: no need to use `this.` unless setting an instance variable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2001: iss1980 a special close flag added to indicate that the writer is closed

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2001:
URL: https://github.com/apache/iceberg/pull/2001#discussion_r551623457



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -170,6 +181,9 @@ private void flushRowGroup(boolean finished) {
   }
 
   private void startRowGroup() {
+    if (this.closed) {
+      throw new IllegalStateException("writer is closed");
+    }
     try {
       this.nextRowGroupSize = Math.min(writer.getNextRowGroupSize(), targetRowGroupSize);
     } catch (IOException e) {

Review comment:
       Can you add a blank line between control flow blocks?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org