You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2021/09/22 15:44:57 UTC

[GitHub] [parquet-mr] gszadovszky opened a new pull request #933: PARQUET-2094: Handle negative values in page headers

gszadovszky opened a new pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
     - https://issues.apache.org/jira/browse/PARQUET-XXX
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] plygrnd commented on pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
plygrnd commented on pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#issuecomment-925872659


   Okay, LGTM then. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r717364884



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the
+   * footer, page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends RuntimeException {

Review comment:
       It is a fair point anyway. I'll move it before merging. Thanks a lot for the review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r717768759



##########
File path: parquet-format-structures/src/test/java/org/apache/parquet/format/TestUtil.java
##########
@@ -23,13 +23,16 @@
 import static junit.framework.Assert.assertNull;
 import static org.apache.parquet.format.Util.readFileMetaData;
 import static org.apache.parquet.format.Util.writeFileMetaData;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
 
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
+import java.io.IOException;
 
+import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer;
 import org.junit.Test;
 
-import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer;

Review comment:
       Nit: unnecessary change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r716970875



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the
+   * footer, page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends RuntimeException {

Review comment:
       Minor: I'd prefer it if the exception weren't an inner class since that makes it harder to reference. But this isn't a blocker.

##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/InvalidParquetMetadataException.java
##########
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the footer,
+ * page header etc.).
+ */
+public class InvalidParquetMetadataException extends RuntimeException {
+  <T> InvalidParquetMetadataException(String message) {

Review comment:
       What is the type parameter for?

##########
File path: parquet-format-structures/src/test/java/org/apache/parquet/format/TestUtil.java
##########
@@ -23,13 +23,16 @@
 import static junit.framework.Assert.assertNull;
 import static org.apache.parquet.format.Util.readFileMetaData;
 import static org.apache.parquet.format.Util.writeFileMetaData;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
 
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
+import java.io.IOException;
 
+import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer;
 import org.junit.Test;
 
-import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer;

Review comment:
       Nit: unnecessary change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] plygrnd commented on pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
plygrnd commented on pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#issuecomment-925773075


   TravisCI runs contain multiple `ParquetDecodingException`s ([example](https://app.travis-ci.com/github/apache/parquet-mr/jobs/538955407#L2605)). Is this expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r717364884



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the
+   * footer, page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends RuntimeException {

Review comment:
       It is a fair point anyway. I'll move it before merging. Thanks a lot for the review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#issuecomment-925864508


   > TravisCI runs contain multiple `ParquetDecodingException`s ([example](https://app.travis-ci.com/github/apache/parquet-mr/jobs/538955407#L2605)). Is this expected?
   
   @plygrnd, yes this is expected. The test class [TestCorruptThriftRecords](https://github.com/apache/parquet-mr/blob/master/parquet-thrift/src/test/java/org/apache/parquet/hadoop/thrift/TestCorruptThriftRecords.java) verifies if the correct exceptions are thrown. It uses MR to execute the tests and it seems MR logs the exceptions before they are thrown to the caller.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky merged pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky merged pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r715155364



##########
File path: parquet-format-structures/src/test/java/org/apache/parquet/format/TestUtil.java
##########
@@ -77,6 +81,21 @@ public void testReadFileMetadata() throws Exception {
     assertEquals(md, md6);
   }
 
+  @Test
+  public void testInvalidPageHeader() throws IOException {
+    PageHeader ph = new PageHeader(PageType.DATA_PAGE, 100, -50);
+    ByteArrayOutputStream out = new ByteArrayOutputStream();
+    Util.writePageHeader(ph, out);
+
+    try {
+      Util.readPageHeader(in(out));
+      fail("Expected exception but did not thrown");
+    } catch (InvalidParquetMetadataException e) {
+      assertTrue("Exception message does not contain the expected parts",
+          e.getMessage().contains("pageHeader.compressed_page_size"));

Review comment:
       Isn't there an assertion helper so you don't need to catch the exception? Something like assertThrows in the codebase?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r715154831



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/Util.java
##########
@@ -130,7 +131,7 @@ public static PageHeader readPageHeader(InputStream from) throws IOException {
 
   public static PageHeader readPageHeader(InputStream from, 
       BlockCipher.Decryptor decryptor, byte[] AAD) throws IOException {
-    return read(from, new PageHeader(), decryptor, AAD);
+    return validate(read(from, new PageHeader(), decryptor, AAD));

Review comment:
       I think it would be more clear if you called `MetadataValidator.validate` rather than just `validate`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#issuecomment-933027683


   @gszadovszky, thanks for getting this done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r715153249



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+import java.io.IOException;
+import java.util.function.Predicate;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer,
+   * page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends IOException {

Review comment:
       I don't think that this is an IOException. What is the value of making this a checked exception? Why not just make this a RuntimeException? Or use some existing one like IllegalStateException or ParquetDecodingException?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r717768214



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/InvalidParquetMetadataException.java
##########
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the footer,
+ * page header etc.).
+ */
+public class InvalidParquetMetadataException extends RuntimeException {
+  <T> InvalidParquetMetadataException(String message) {

Review comment:
       What is the type parameter for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r716450578



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+import java.io.IOException;
+import java.util.function.Predicate;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer,
+   * page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends IOException {

Review comment:
       The module `parquet-format-structures` is the one all the others are depended on. Parquet exceptions are implemented in another module so I cannot use them here. Since we already throw IOExceptions I've felt extending it would be a good idea. But you might be right. I am happy to extend RuntimeException instead of IOException.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] shangxinli commented on pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
shangxinli commented on pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#issuecomment-925915291


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r716453533



##########
File path: parquet-format-structures/src/test/java/org/apache/parquet/format/TestUtil.java
##########
@@ -77,6 +81,21 @@ public void testReadFileMetadata() throws Exception {
     assertEquals(md, md6);
   }
 
+  @Test
+  public void testInvalidPageHeader() throws IOException {
+    PageHeader ph = new PageHeader(PageType.DATA_PAGE, 100, -50);
+    ByteArrayOutputStream out = new ByteArrayOutputStream();
+    Util.writePageHeader(ph, out);
+
+    try {
+      Util.readPageHeader(in(out));
+      fail("Expected exception but did not thrown");
+    } catch (InvalidParquetMetadataException e) {
+      assertTrue("Exception message does not contain the expected parts",
+          e.getMessage().contains("pageHeader.compressed_page_size"));

Review comment:
       Yes, there is something already implemented but in another module and I cannot use it here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r716970875



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the
+   * footer, page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends RuntimeException {

Review comment:
       Minor: I'd prefer it if the exception weren't an inner class since that makes it harder to reference. But this isn't a blocker.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r716452332



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+import java.io.IOException;
+import java.util.function.Predicate;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer,
+   * page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends IOException {
+    private <T> InvalidParquetMetadataException(String metaName, T value) {
+      super("Metadata " + metaName + " is invalid: " + value);
+    }
+  }
+
+  static PageHeader validate(PageHeader pageHeader) throws InvalidParquetMetadataException {
+    validateValue(size -> size >= 0, pageHeader.getCompressed_page_size(), "pageHeader.compressed_page_size");
+    return pageHeader;
+  }
+
+  private static <T> void validateValue(Predicate<? super T> validator, T value, String metaName)

Review comment:
       I am not sure why I've implemented this way. I'm fine rewriting to use a simple boolean.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] rdblue commented on a change in pull request #933: PARQUET-2094: Handle negative values in page headers

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #933:
URL: https://github.com/apache/parquet-mr/pull/933#discussion_r715154439



##########
File path: parquet-format-structures/src/main/java/org/apache/parquet/format/MetadataValidator.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+import java.io.IOException;
+import java.util.function.Predicate;
+
+/**
+ * Utility class to validate different types of Parquet metadata (e.g. footer, page headers etc.).
+ */
+public class MetadataValidator {
+
+  /**
+   * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer,
+   * page header etc.).
+   */
+  public static class InvalidParquetMetadataException extends IOException {
+    private <T> InvalidParquetMetadataException(String metaName, T value) {
+      super("Metadata " + metaName + " is invalid: " + value);
+    }
+  }
+
+  static PageHeader validate(PageHeader pageHeader) throws InvalidParquetMetadataException {
+    validateValue(size -> size >= 0, pageHeader.getCompressed_page_size(), "pageHeader.compressed_page_size");
+    return pageHeader;
+  }
+
+  private static <T> void validateValue(Predicate<? super T> validator, T value, String metaName)

Review comment:
       Why accept a predicate? Most check methods like this use a boolean. I would expect it to work like a Precondition:
   
   ```java
   if (!isValid) {
     throw new ParquetDecodingException(...);
   }
   
   int size = pageHeader.getCompressed_page_size()
   validateValue(size >= 0, String.format("Compressed page size must be positive, but was: %s", size));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org