You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/10/08 14:41:01 UTC

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #824: PARQUET-1920: Fix Parquet writer's memory check interval calculation

gszadovszky commented on a change in pull request #824:
URL: https://github.com/apache/parquet-mr/pull/824#discussion_r501773921



##########
File path: parquet-common/src/main/java/org/apache/parquet/bytes/BytesInput.java
##########
@@ -215,6 +215,13 @@ public static BytesInput copy(BytesInput bytesInput) throws IOException {
    * @throws IOException if there is an exception reading
    */
   public byte[] toByteArray() throws IOException {
+    long size = size();
+    if (size > Integer.MAX_VALUE) {
+      throw new IOException("Page size, " + size + ", is larger than allowed " + Integer.MAX_VALUE +"." +
+        " Usually caused by a Parquet writer writing too big column chunks on encountering highly skewed dataset." +
+        " Please set page.size.row.check.max to a lower value on the writer, default value is 10000." +
+        " You can try setting it to " + (10000 / (size/ Integer.MAX_VALUE)) + " or lower.");

Review comment:
       nit:
   ```suggestion
         throw new IOException("Page size, " + size + ", is larger than allowed " + Integer.MAX_VALUE + "." +
           " Usually caused by a Parquet writer writing too big column chunks on encountering highly skewed dataset." +
           " Please set page.size.row.check.max to a lower value on the writer, default value is 10000." +
           " You can try setting it to " + (10000 / (size / Integer.MAX_VALUE)) + " or lower.");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org