You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/25 20:01:44 UTC
[GitHub] [arrow-rs] alamb opened a new issue #614: Get MIRI running against parquet crate
alamb opened a new issue #614:
URL: https://github.com/apache/arrow-rs/issues/614
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri
To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too.
**Describe the solution you'd like**
Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60
This means a command something like
```shell
cargo +nightly miri test -p parquet
```
Sadly, when I run this locally the very first test fails with an alignment issue
```shell
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri test -p parquet
WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping.
Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
Finished test [unoptimized + debuginfo] target(s) in 4.53s
Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)
running 456 tests
test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
--> parquet/src/util/bit_packing.rs:150:12
|
150 | *out = (*in_buf) % (1u32 << 2);
| ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14
--> parquet/src/util/bit_packing.rs:37:14
|
37 | 2 => unpack2_32(in_ptr, out_ptr),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30
--> parquet/src/util/bit_util.rs:568:30
|
568 | in_ptr = unpack32(in_ptr, out_ptr, num_bits);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30
--> parquet/src/encodings/rle.rs:415:30
|
415 | num_values = bit_reader.get_batch::<T>(
| ______________________________^
416 | | &mut buffer[values_read..values_read + num_values],
417 | | self.bit_width as usize,
418 | | );
| |_________________^
note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35
--> parquet/src/encodings/levels.rs:262:35
|
262 | let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9
--> parquet/src/column/reader.rs:459:9
|
459 | level_decoder.get(buffer)
| ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38
--> parquet/src/column/reader.rs:210:38
|
210 | num_def_levels = self.read_def_levels(
| ______________________________________^
211 | | &mut levels[levels_read..levels_read + iter_batch_size],
212 | | )?;
| |_____________________^
note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17
--> parquet/src/arrow/array_reader.rs:490:17
|
490 | / self.column_reader.as_mut().unwrap().read_batch(
491 | | num_to_read,
492 | | cur_def_levels_buf,
493 | | cur_rep_levels_buf,
494 | | cur_data_buf,
495 | | )?;
| |_________________^
note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21
--> parquet/src/arrow/array_reader.rs:2229:21
|
2229 | let array = array_reader.next_batch(values_per_page / 2).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
--> parquet/src/arrow/array_reader.rs:2154:5
|
2153 | #[test]
| ------- in this procedural macro expansion
2154 | / fn test_complex_array_reader_def_and_rep_levels() {
2155 | | // Construct column schema
2156 | | let message_type = "
2157 | | message test_schema {
... |
2276 | | );
2277 | | }
| |_____^
= note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
= note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
= note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
= note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
= note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
= note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
= note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
= note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
= note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
= note: inside `main`
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
= note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
= note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
= note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
= note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
= note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
```
**Describe alternatives you've considered**
The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run
**Additional context**
Thanks to help from @roee88 , @jhorstmann and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440
Which runs a command such as:
```shell
cargo miri test -p arrow -- --skip csv --skip ipc --skip json
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #614: Get MIRI running against parquet crate
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886250599
As noted by @houqp -- I originally filed this in the wrong repo by mistake: https://github.com/apache/arrow-datafusion/issues/773
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #614: Get MIRI running against parquet crate
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886250668
🤔 though I tried to get a clean MIRI run against the parquet2 crate
```shell
(arrow_dev) alamb@MacBook-Pro:~/Software/parquet2$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -- --skip io
```
And I got some errors; I am probably running it incorrectly 😢
```
test read::levels::tests::test_get_bit_width ... ok
test read::metadata::tests::test_basics ... error: unsupported operation: can't call foreign function: strerror_r
--> /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
|
122 | if strerror_r(errno as c_int, p, buf.len()) < 0 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function: strerror_r
|
= help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
= note: inside `std::sys::unix::os::error_string` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
= note: inside `<std::io::error::Repr as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:699:36
= note: inside `<std::io::Error as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:65:9
= note: inside `<&dyn std::fmt::Debug as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2036:62
= note: inside `std::fmt::write` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1115:17
= note: inside `<std::string::String as std::fmt::Write>::write_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:187:9
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:481:22
= note: inside `std::option::Option::<std::string::String>::get_or_insert_with::<[closure@std::panicking::begin_panic_handler::PanicPayload::fill::{closure#0}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:1270:26
= note: inside `std::panicking::begin_panic_handler::PanicPayload::fill` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:479:13
= note: inside `<std::panicking::begin_panic_handler::PanicPayload as core::panic::BoxMeUp>::get` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:497:13
= note: inside `std::panicking::rust_panic_with_hook` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:621:34
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:13
= note: inside `std::sys_common::backtrace::__rust_end_short_backtrace::<[closure@std::panicking::begin_panic_handler::{closure#0}], !>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:141:18
= note: inside `core::panicking::panic_fmt::panic_impl` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:515:5
= note: inside `core::panicking::panic_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:92:14
= note: inside `std::result::unwrap_failed` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic.rs:24:9
= note: inside `std::result::Result::<std::fs::File, std::io::Error>::unwrap` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1281:23
note: inside `read::metadata::tests::test_basics` at src/read/metadata.rs:188:24
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jhorstmann commented on issue #614: Get MIRI running against parquet crate
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886259653
Not really familiar with that code, but there is a FIXME comment about the alignment after pointer casts here: https://github.com/apache/arrow-rs/blob/1d6c37452f74a6443d269d0bdfc2c3738130fd5e/parquet/src/util/bit_util.rs#L556
If the alignment can't be guaranteed to be valid for u32, one possible solution could be to replace all `*in_buf` in bit_packing.rs with `std::ptr::read_unaligned(in_buf)`. On x86 that would not have any performance impact.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org