You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/26 15:05:17 UTC

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3828: Enable Parquet Row and Page Filtering by default (WIP)

Ted-Jiang commented on PR #3828:
URL: https://github.com/apache/arrow-datafusion/pull/3828#issuecomment-1328062248

   > Specifically made the parquet files like this:
   > 
   > ```
   > RUSTFLAGS="-C target-cpu=native" cargo run --release --bin tpch -- convert --input ~/tpch_data/data_SF1 --output ~/tpch_data/parquet_data_SF1 --format=parquet
   > ```
   > 
   > And then ran
   > 
   > ```
   > RUSTFLAGS="-C target-cpu=native" cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ~/tpch_data/parquet_data_SF1 --format parquet --batch-size 4096          
   > 
   >     Finished release [optimized] target(s) in 0.28s
   >      Running `target/release/tpch benchmark datafusion --iterations 3 --path /home/alamb/tpch_data/parquet_data_SF1 --format parquet --batch-size 4096`
   > Running benchmarks with the following options: DataFusionBenchmarkOpt { query: None, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "/home/alamb/tpch_data/parquet_data_SF1", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: false, enable_scheduler: false }
   > Query 1 iteration 0 took 1511.2 ms and returned 4 rows
   > Query 1 iteration 1 took 1372.2 ms and returned 4 rows
   > Query 1 iteration 2 took 1419.7 ms and returned 4 rows
   > Query 1 avg time: 1434.38 ms
   > thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:129:27
   > note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   > Error: ArrowError(ExternalError(ArrowError(ExternalError("Arrow error: External error: Execution error: Arrow error: External error: Arrow error: External error: Execution error: Arrow error: External error: Execution error: Join Error: task 218 panicked"))))
   > alamb@aal-dev:~/arrow-datafusion$ 
   > ```
   > 
   > FYI @Ted-Jiang -- haven't had a chance to file this as a ticket or look more carefully into it
   
   Thanks for testing this, i will try to figure it out tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org