You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/25 20:01:44 UTC

[GitHub] [arrow-rs] alamb opened a new issue #614: Get MIRI running against parquet crate

alamb opened a new issue #614:
URL: https://github.com/apache/arrow-rs/issues/614


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri
   
   To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too.
   
   **Describe the solution you'd like**
   
   Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60
   
   This means a command something like
   
   ```shell
   cargo +nightly miri  test  -p parquet
   ```
   
   Sadly, when I run this locally the very first test fails with an alignment issue
   
   ```shell
   (arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri  test  -p parquet
   WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping.
      Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
      Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
       Finished test [unoptimized + debuginfo] target(s) in 4.53s
        Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)
   
   running 456 tests
   test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
       --> parquet/src/util/bit_packing.rs:150:12
        |
   150  |     *out = (*in_buf) % (1u32 << 2);
        |            ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
        |
        = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
        = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
                
        = note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12
   note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14
       --> parquet/src/util/bit_packing.rs:37:14
        |
   37   |         2 => unpack2_32(in_ptr, out_ptr),
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30
       --> parquet/src/util/bit_util.rs:568:30
        |
   568  |                     in_ptr = unpack32(in_ptr, out_ptr, num_bits);
        |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30
       --> parquet/src/encodings/rle.rs:415:30
        |
   415  |                   num_values = bit_reader.get_batch::<T>(
        |  ______________________________^
   416  | |                     &mut buffer[values_read..values_read + num_values],
   417  | |                     self.bit_width as usize,
   418  | |                 );
        | |_________________^
   note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35
       --> parquet/src/encodings/levels.rs:262:35
        |
   262  |                 let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?;
        |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9
       --> parquet/src/column/reader.rs:459:9
        |
   459  |         level_decoder.get(buffer)
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38
       --> parquet/src/column/reader.rs:210:38
        |
   210  |                       num_def_levels = self.read_def_levels(
        |  ______________________________________^
   211  | |                         &mut levels[levels_read..levels_read + iter_batch_size],
   212  | |                     )?;
        | |_____________________^
   note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17
       --> parquet/src/arrow/array_reader.rs:490:17
        |
   490  | /                 self.column_reader.as_mut().unwrap().read_batch(
   491  | |                     num_to_read,
   492  | |                     cur_def_levels_buf,
   493  | |                     cur_rep_levels_buf,
   494  | |                     cur_data_buf,
   495  | |                 )?;
        | |_________________^
   note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21
       --> parquet/src/arrow/array_reader.rs:2229:21
        |
   2229 |         let array = array_reader.next_batch(values_per_page / 2).unwrap();
        |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
       --> parquet/src/arrow/array_reader.rs:2154:5
        |
   2153 |       #[test]
        |       ------- in this procedural macro expansion
   2154 | /     fn test_complex_array_reader_def_and_rep_levels() {
   2155 | |         // Construct column schema
   2156 | |         let message_type = "
   2157 | |         message test_schema {
   ...    |
   2276 | |         );
   2277 | |     }
        | |_____^
        = note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
        = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
        = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
        = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
        = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
        = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
        = note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
        = note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
        = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
        = note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
        = note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
        = note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
        = note: inside `main`
        = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
        = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
        = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
        = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
        = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
        = note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
        = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
   
   
   ```
   
   **Describe alternatives you've considered**
   The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run
   
   
   
   **Additional context**
   Thanks to help from @roee88 , @jhorstmann  and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440
   
   Which runs a command such as:
   ```shell
     cargo miri test -p arrow -- --skip csv --skip ipc --skip json
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #614: Get MIRI running against parquet crate

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886250599


   As noted by @houqp  -- I originally filed this in the wrong repo by mistake: https://github.com/apache/arrow-datafusion/issues/773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #614: Get MIRI running against parquet crate

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886250668


   🤔  though I tried to get a clean MIRI run against the parquet2 crate 
   
   ```shell
   (arrow_dev) alamb@MacBook-Pro:~/Software/parquet2$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -- --skip io
   ```
   
   And I got some errors; I am probably running it incorrectly 😢 
   
   ```
   test read::levels::tests::test_get_bit_width ... ok
   test read::metadata::tests::test_basics ... error: unsupported operation: can't call foreign function: strerror_r
      --> /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
       |
   122 |         if strerror_r(errno as c_int, p, buf.len()) < 0 {
       |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function: strerror_r
       |
       = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
               
       = note: inside `std::sys::unix::os::error_string` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
       = note: inside `<std::io::error::Repr as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:699:36
       = note: inside `<std::io::Error as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:65:9
       = note: inside `<&dyn std::fmt::Debug as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2036:62
       = note: inside `std::fmt::write` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1115:17
       = note: inside `<std::string::String as std::fmt::Write>::write_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:187:9
       = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:481:22
       = note: inside `std::option::Option::<std::string::String>::get_or_insert_with::<[closure@std::panicking::begin_panic_handler::PanicPayload::fill::{closure#0}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:1270:26
       = note: inside `std::panicking::begin_panic_handler::PanicPayload::fill` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:479:13
       = note: inside `<std::panicking::begin_panic_handler::PanicPayload as core::panic::BoxMeUp>::get` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:497:13
       = note: inside `std::panicking::rust_panic_with_hook` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:621:34
       = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:13
       = note: inside `std::sys_common::backtrace::__rust_end_short_backtrace::<[closure@std::panicking::begin_panic_handler::{closure#0}], !>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:141:18
       = note: inside `core::panicking::panic_fmt::panic_impl` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:515:5
       = note: inside `core::panicking::panic_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:92:14
       = note: inside `std::result::unwrap_failed` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic.rs:24:9
       = note: inside `std::result::Result::<std::fs::File, std::io::Error>::unwrap` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1281:23
   note: inside `read::metadata::tests::test_basics` at src/read/metadata.rs:188:24
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #614: Get MIRI running against parquet crate

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #614:
URL: https://github.com/apache/arrow-rs/issues/614#issuecomment-886259653


   Not really familiar with that code, but there is a FIXME comment about the alignment after pointer casts here: https://github.com/apache/arrow-rs/blob/1d6c37452f74a6443d269d0bdfc2c3738130fd5e/parquet/src/util/bit_util.rs#L556
   
   If the alignment can't be guaranteed to be valid for u32, one possible solution could be to replace all `*in_buf` in bit_packing.rs with `std::ptr::read_unaligned(in_buf)`. On x86 that would not have any performance impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org