You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/17 17:02:46 UTC

[GitHub] [arrow-rs] tustvold opened a new issue #1053: Parquet Fuzz Tests

tustvold opened a new issue #1053:
URL: https://github.com/apache/arrow-rs/issues/1053


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Whilst working on #1037 I've introduced bugs that have then been caught by the arrow array benchmarks. 
   
   It would therefore appear that these tests are exercising code paths not found in the other tests, and we could therefore increase the test coverage by including some variant of them.
   
   **Describe the solution you'd like**
   
   A set of fuzz tests that create various types of `PageIterator` with multiple column chunks, and multiple pages per column chunk. This can likely reuse much of the fuzz plumbing found in the arrow_array_reader benchmarks.
   
   The tests would then use the `ArrayReader` abstractions to read this data and verify it is what was written.
   
   **Describe alternatives you've considered**
   
   We could not add fuzz tests, but there would be an increased likelihood of regressions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] chadbrewbaker edited a comment on issue #1053: Parquet Fuzz Tests

Posted by GitBox <gi...@apache.org>.
chadbrewbaker edited a comment on issue #1053:
URL: https://github.com/apache/arrow-rs/issues/1053#issuecomment-997457311


   After thinking about this for a week - I'm inclined to start driving with [ Arrow Python/Hypothesis](https://github.com/apache/arrow/blob/master/python/pyarrow/tests/strategies.py)  and [Python Parquet tests](https://github.com/apache/arrow/tree/master/python/pyarrow/tests/parquet) then gradually add Proptest. AWS Labs has the [best proptest examples](https://github.com/search?q=org%3Aawslabs+proptest).
   
   Zooming out a bit more, DataFusion needs to be integrated in [squirrel](https://github.com/s3team/Squirrel) - [sqlancer](https://github.com/sqlancer/sqlancer) cross SQL engine tests. Can use [sqlsmith](https://github.com/anse1/sqlsmith) for reductions of large queries.
   
   We also want to be like AWS Redshift where you write a query in Python/SQL - and it emits Rust code that gets compiled and sent to worker nodes.
   
   Seems we might need thin-lto even on dev builds to reduce false positives  https://github.com/awslabs/rust-smt-ir/blob/551565ea5e97f502269d74d189e2e2c1e6b52f40/Cargo.toml#L11
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on issue #1053: Parquet Fuzz Tests

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1053:
URL: https://github.com/apache/arrow-rs/issues/1053#issuecomment-1002727645


   FYI I'm experimenting with extending the [existing fuzz tests](https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_reader.rs#L459) to support nulls, dictionaries, etc...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] chadbrewbaker commented on issue #1053: Parquet Fuzz Tests

Posted by GitBox <gi...@apache.org>.
chadbrewbaker commented on issue #1053:
URL: https://github.com/apache/arrow-rs/issues/1053#issuecomment-997457311


   After thinking about this for a week - I'm inclined to start driving with [ Arrow Python/Hypothesis](https://github.com/apache/arrow/blob/master/python/pyarrow/tests/strategies.py)  and [Python Parquet tests](https://github.com/apache/arrow/tree/master/python/pyarrow/tests/parquet) then gradually add Proptest. AWS Labs has the [best proptest examples](https://github.com/search?q=org%3Aawslabs+proptest).
   
   Zooming out a bit more, DataFusion needs to be integrated in [sqlancer](https://github.com/sqlancer/sqlancer) cross SQL engine tests. We also want to be like AWS Redshift where you write a query in Python/SQL - and it emits Rust code that gets compiled and sent to worker nodes.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed issue #1053: Parquet Fuzz Tests

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1053:
URL: https://github.com/apache/arrow-rs/issues/1053


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org