You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/23 12:27:56 UTC

[GitHub] [arrow] liyafan82 opened a new pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

liyafan82 opened a new pull request #10789:
URL: https://github.com/apache/arrow/pull/10789


   Please see https://issues.apache.org/jira/browse/ARROW-5926


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-895830542


   > Can you confirm if the new app can be integrates with the upstream fuzz project?
   
   I have enabled the new app in `debian-java`. Will confirm this later. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685798036



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorTypeVisitor.java
##########
@@ -114,6 +114,25 @@ private void validateDateVector(ValueVector vector, DateUnit expectedDateUnit) {
         "Expecting date unit %s, actual date unit %s.", expectedDateUnit, dateType.getUnit());
   }
 
+  private void validateDecimalVector(ValueVector vector) {

Review comment:
       A unit test added. Thanks for the good suggestion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-931395141


   > > @emkornfield Sorry for my late reply. I am tied up to work and have very limited bandwidth these days.
   > 
   > No worries, I think we all are. Thank you for the contribution.
   > 
   > > I agree with you, and I do not think the current implementation (do it through UT) contradicts with this objective, as new inputs will be tested automatically after being added to the repository. Maybe I misunderstand your point?
   > 
   > The difference here is that this would test against only files that generate an error on the C++ implementation (which has already gone through quite a lot of rounds of fuzzing). if it can be enabled directly with oss-fuzz then will get tested separately and might uncover other edge cases
   
   @emkornfield I see. Thanks for the pointer. I will set up the process.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-922324693


   > > > Can you confirm if the new app can be integrates with the upstream fuzz project?
   > > 
   > > 
   > > After some attempts, I do not feel it a good idea to run the test from the command line, as it requires too many dependencies.
   > > So I have transformed the test into a UT (the running time is not long).
   > 
   > Sorry I missed this response. I think integrating it with the upstream fuzzing library is important because it means it will get tested against new random inputs that might expose issues in the future. What do you mean by too many dependencies?
   
   @emkornfield Sorry for my late reply. I am tied up to work and have very limited bandwidth these days. 
   
   > I think integrating it with the upstream fuzzing library is important because it means it will get tested against new random inputs that might expose issues in the future. 
   
   I agree with you, and I do not think the current implementation (do it through UT) contradicts with this objective, as new inputs will be tested automatically after being added to the repository. Maybe I misunderstand your point?
   
   > What do you mean by too many dependencies?
   
   If we run the test from command line, we will find that it requires too many jar libraries on the class path (some are directly referenced, while others are indirectly referenced). This makes the command extremely long (maybe it spans a few pages) and ugly.
   
   This can be easily verified, as some IDE (e.g. Intellij) prints the command line in the console for each UT/Application being run. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685679475



##########
File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.tools;
+
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.util.ValueVectorUtility;
+import org.apache.commons.io.filefilter.WildcardFileFilter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestIpcFuzz {
+
+  static final List<String> WHITE_LIST = new ArrayList<>();
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(TestIpcFuzz.class);
+
+  static {
+    WHITE_LIST.add("clusterfuzz-testcase-minimized-arrow-ipc-file-fuzz-5707423356813312");
+  }
+
+  static void readIpcFile(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowFileReader reader = new ArrowFileReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static void readIpcStream(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowStreamReader reader = new ArrowStreamReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static File[] getTestFiles(String testFilePath) {
+    int idx = testFilePath.lastIndexOf(File.separator);
+    File directory = new File(testFilePath.substring(0, idx));
+    String filter = testFilePath.substring(idx + 1);
+    FileFilter fileFilter = new WildcardFileFilter(filter);
+    return directory.listFiles(fileFilter);
+  }
+
+  static void testFuzz(boolean stream, File test) {
+    if (stream) {
+      readIpcStream(test);
+    } else {
+      readIpcFile(test);
+    }
+  }
+
+  public static void main(String[] args) {
+    if (args.length < 2) {
+      LOGGER.error("Usage: <cmd> [stream|file] <test file path>");
+      System.exit(1);
+    }
+
+    final boolean stream;
+    if (args[0].equalsIgnoreCase("file")) {
+      stream = false;
+    } else if (args[0].equalsIgnoreCase("stream")) {
+      stream = true;
+    } else {
+      throw new IllegalArgumentException("The first argument must be file or stream");
+    }
+
+    File[] testFiles = getTestFiles(args[1]);
+    for (File test : testFiles) {
+      LOGGER.info("Testing file " + test.getName());
+      if (WHITE_LIST.contains(test.getName())) {
+        testFuzz(stream, test);
+        LOGGER.info("Test finished successfully.");
+      } else {
+        Exception e = assertThrows(Exception.class,

Review comment:
       wouldn't we want a specific exception here?  Does this have a non-zero exit code if it failes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685679951



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorTypeVisitor.java
##########
@@ -114,6 +114,25 @@ private void validateDateVector(ValueVector vector, DateUnit expectedDateUnit) {
         "Expecting date unit %s, actual date unit %s.", expectedDateUnit, dateType.getUnit());
   }
 
+  private void validateDecimalVector(ValueVector vector) {

Review comment:
       is this only testable via the fuzz files, or can a unit test be added?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-1020084856


   > Does this need to be merged before the OSS-Fuzz PR?
   
   Maybe not. The oss-fuzz PR is independent of this. However, some bugs in this MR need to be fixed. 
   I will close this, and submit another PR later. Thanks for your attention. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-917714489


   > > Can you confirm if the new app can be integrates with the upstream fuzz project?
   > 
   > After some attempts, I do not feel it a good idea to run the test from the command line, as it requires too many dependencies.
   > So I have transformed the test into a UT (the running time is not long).
   
   Sorry I missed this response.  I think integrating it with the upstream fuzzing library is important because it means it will get tested against new random inputs that might expose issues in the future.  What do you mean by too many dependencies?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-1019965461


   Does this need to be merged before the OSS-Fuzz PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685795711



##########
File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.tools;
+
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.util.ValueVectorUtility;
+import org.apache.commons.io.filefilter.WildcardFileFilter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestIpcFuzz {
+
+  static final List<String> WHITE_LIST = new ArrayList<>();
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(TestIpcFuzz.class);
+
+  static {
+    WHITE_LIST.add("clusterfuzz-testcase-minimized-arrow-ipc-file-fuzz-5707423356813312");

Review comment:
       sure. comment added.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685797491



##########
File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.tools;
+
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.util.ValueVectorUtility;
+import org.apache.commons.io.filefilter.WildcardFileFilter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestIpcFuzz {
+
+  static final List<String> WHITE_LIST = new ArrayList<>();
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(TestIpcFuzz.class);
+
+  static {
+    WHITE_LIST.add("clusterfuzz-testcase-minimized-arrow-ipc-file-fuzz-5707423356813312");
+  }
+
+  static void readIpcFile(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowFileReader reader = new ArrowFileReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static void readIpcStream(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowStreamReader reader = new ArrowStreamReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static File[] getTestFiles(String testFilePath) {
+    int idx = testFilePath.lastIndexOf(File.separator);
+    File directory = new File(testFilePath.substring(0, idx));
+    String filter = testFilePath.substring(idx + 1);
+    FileFilter fileFilter = new WildcardFileFilter(filter);
+    return directory.listFiles(fileFilter);
+  }
+
+  static void testFuzz(boolean stream, File test) {
+    if (stream) {
+      readIpcStream(test);
+    } else {
+      readIpcFile(test);
+    }
+  }
+
+  public static void main(String[] args) {
+    if (args.length < 2) {
+      LOGGER.error("Usage: <cmd> [stream|file] <test file path>");
+      System.exit(1);
+    }
+
+    final boolean stream;
+    if (args[0].equalsIgnoreCase("file")) {
+      stream = false;
+    } else if (args[0].equalsIgnoreCase("stream")) {
+      stream = true;
+    } else {
+      throw new IllegalArgumentException("The first argument must be file or stream");
+    }
+
+    File[] testFiles = getTestFiles(args[1]);
+    for (File test : testFiles) {
+      LOGGER.info("Testing file " + test.getName());
+      if (WHITE_LIST.contains(test.getName())) {
+        testFuzz(stream, test);
+        LOGGER.info("Test finished successfully.");
+      } else {
+        Exception e = assertThrows(Exception.class,

Review comment:
       > wouldn't we want a specific exception here? 
   I am afraid there is not a specific exception here, because some exception are thrown when validating the vector (ValidateException), while others may be thown when creating the schema/vector (maybe IllegalArgumentException)
   
   > Does this have a non-zero exit code if it failes?
   Yes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-885603623


   https://issues.apache.org/jira/browse/ARROW-5926


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 closed pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 closed pull request #10789:
URL: https://github.com/apache/arrow/pull/10789


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-927225717


   > @emkornfield Sorry for my late reply. I am tied up to work and have very limited bandwidth these days.
   
   No worries, I think we all are.  Thank you for the contribution.
   
   > I agree with you, and I do not think the current implementation (do it through UT) contradicts with this objective, as new inputs will be tested automatically after being added to the repository. Maybe I misunderstand your point?
   
   The difference here is that this would test against only files that generate an error on the C++ implementation (which has already gone through quite a lot of rounds of fuzzing).  if it can be enabled directly with oss-fuzz then will get tested separately and might uncover other edge cases 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685797491



##########
File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.tools;
+
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.util.ValueVectorUtility;
+import org.apache.commons.io.filefilter.WildcardFileFilter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestIpcFuzz {
+
+  static final List<String> WHITE_LIST = new ArrayList<>();
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(TestIpcFuzz.class);
+
+  static {
+    WHITE_LIST.add("clusterfuzz-testcase-minimized-arrow-ipc-file-fuzz-5707423356813312");
+  }
+
+  static void readIpcFile(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowFileReader reader = new ArrowFileReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static void readIpcStream(File ipcFile) {
+    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
+    try (ArrowStreamReader reader = new ArrowStreamReader(new FileInputStream(ipcFile).getChannel(), allocator)) {
+      VectorSchemaRoot root = reader.getVectorSchemaRoot();
+
+      // validate schema
+      ValueVectorUtility.validate(root);
+
+      while (reader.loadNextBatch()) {
+        ValueVectorUtility.validateFull(root);
+      }
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  static File[] getTestFiles(String testFilePath) {
+    int idx = testFilePath.lastIndexOf(File.separator);
+    File directory = new File(testFilePath.substring(0, idx));
+    String filter = testFilePath.substring(idx + 1);
+    FileFilter fileFilter = new WildcardFileFilter(filter);
+    return directory.listFiles(fileFilter);
+  }
+
+  static void testFuzz(boolean stream, File test) {
+    if (stream) {
+      readIpcStream(test);
+    } else {
+      readIpcFile(test);
+    }
+  }
+
+  public static void main(String[] args) {
+    if (args.length < 2) {
+      LOGGER.error("Usage: <cmd> [stream|file] <test file path>");
+      System.exit(1);
+    }
+
+    final boolean stream;
+    if (args[0].equalsIgnoreCase("file")) {
+      stream = false;
+    } else if (args[0].equalsIgnoreCase("stream")) {
+      stream = true;
+    } else {
+      throw new IllegalArgumentException("The first argument must be file or stream");
+    }
+
+    File[] testFiles = getTestFiles(args[1]);
+    for (File test : testFiles) {
+      LOGGER.info("Testing file " + test.getName());
+      if (WHITE_LIST.contains(test.getName())) {
+        testFuzz(stream, test);
+        LOGGER.info("Test finished successfully.");
+      } else {
+        Exception e = assertThrows(Exception.class,

Review comment:
       > wouldn't we want a specific exception here? 
   
   I am afraid there is not a specific exception here, because some exception are thrown when validating the vector (ValidateException), while others may be thown when creating the schema/vector (maybe IllegalArgumentException)
   
   > Does this have a non-zero exit code if it failes?
   
   Yes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#discussion_r685679286



##########
File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.tools;
+
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.util.ValueVectorUtility;
+import org.apache.commons.io.filefilter.WildcardFileFilter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TestIpcFuzz {
+
+  static final List<String> WHITE_LIST = new ArrayList<>();
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(TestIpcFuzz.class);
+
+  static {
+    WHITE_LIST.add("clusterfuzz-testcase-minimized-arrow-ipc-file-fuzz-5707423356813312");

Review comment:
       can you add a comment what this whitelist is for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10789:
URL: https://github.com/apache/arrow/pull/10789#issuecomment-895966651


   > Can you confirm if the new app can be integrates with the upstream fuzz project?
   
   After some attempts, I do not feel it a good idea to run the test from the command line, as it requires too many dependencies. 
   So I have transformed the test into a UT (the running time is not long).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org