You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 15:13:18 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #467: Automatically find PARQUET_TEST_DATA and ARROW_TEST_DATA

alamb opened a new issue #467:
URL: https://github.com/apache/arrow-datafusion/issues/467


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   As someone new to datafusion it may not be clear that to run the tests successfully you need to set `PARQUET_TEST_DATA` and `ARROW_TEST_DATA` environment variables
   
   So today, here is what happens:
   ```
   git clone https://github.com/apache/arrow-datafusion
   cd arrow-datafusion
   cargo test -p datafusion
   ```
   
   Which results in many errors like:
   ```
   ---- physical_plan::windows::tests::window_function_input_partition stdout ----
   thread 'physical_plan::windows::tests::window_function_input_partition' panicked at 'failed to get arrow data dir: env `ARROW_TEST_DATA` is undefined or has empty value, and the pre-defined data dir `/Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-4.2.0/../testing/data` not found
   HINT: try running `git submodule update --init`', /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-4.2.0/src/util/test_util.rs:81:21
   ```
   
   And even when you do as suggested `git submodule update --init` it does not work. Instead, you need to set :
   ```
   export ARROW_TEST_DATA=testing/data
   export PARQUET_TEST_DATA=parquet-testing/data
   cargo test -p datafusion
   ```
   
   **Describe the solution you'd like**
   I would like the tests to automatically try the default locations, as above, if `ARROW_TEST_DATA` and `PARQUET_TEST_DATA` are set.
   
   The tests should pass successfully with only these commands:
   ```
   git clone https://github.com/apache/arrow-datafusion
   cd arrow-datafusion
   git submodule update --init
   cargo test -p datafusion
   ```
   
   The arrow-rs crate already does this ([here](https://github.com/apache/arrow-rs/blob/master/arrow/src/util/test_util.rs#L100) and [here](https://github.com/apache/arrow-rs/blob/master/arrow/src/util/test_util.rs#L78`):  but now that we no longer have arrow-rs and datafusion in the same workspace it stopped working
   
   Perhaps we can simply take the code from arrow-rs and port it to run in datafusion rather than calling arrow::util::test_util
   
   **Describe alternatives you've considered**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #467: Automatically find PARQUET_TEST_DATA and ARROW_TEST_DATA

Posted by GitBox <gi...@apache.org>.
alamb closed issue #467:
URL: https://github.com/apache/arrow-datafusion/issues/467


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org