You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/21 11:56:52 UTC

[GitHub] [arrow] mqy opened a new pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

mqy opened a new pull request #8967:
URL: https://github.com/apache/arrow/pull/8967


   Two env vars ARROW_TEST_DATA and PARQUET_TEST_DATA are required to be set, for running tests,  benchmark.
   
   The major usage likes this: 
   ```
   let testdata = std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not defined"); 
   ```
   
   Some codes gets project root dir by appending relative dir "../../" to current dir.
   
   I'm thinking that, it would be better to add several public utility functions for getting test data dir. 
   
   Basic design is:
   
   - If env is defined  and the value points to existing dir, then we get it;
   - Else: get project root dir by run `git rev-parse --show-toplevel`, then join relative data dir.
   
   This PR adds a new mod arrow::util::test_data_dir, with two public functions, return String or panic on error.
   
   ```
   - ARROW_TEST_DATA()
   - PARQUET_TEST_DATA()
   ```
   
   Example usage:
   ```
   let testdata = ARROW_TEST_DATA();
   ```
   
   Example panic errors:
   
   ```
   - ARROW_TEST_DATA(): `non/existing` defined by env `ARROW_TEST_DATA` is not directory
   - ARROW_TEST_DATA(): `non/existing` defined by env `ARROW_TEST_DATA` not exists or invalid: <err>
   - ARROW_TEST_DATA(): failed to get git top-level dir with `git rev-parse --show-toplevel`
   - ARROW_TEST_DATA(): submodule dir `/path/to/arrow/testing` not exists, did you run `git submodule update --init`?
   - ARROW_TEST_DATA(): submodule data dir `/path/to/arrow/testing/data` not exists, did you run `git submodule update --init`?
   ```
   
   To avoid unexpected breaking to incoming PRs, I reverted previous changes to README.md and .rs files that use ARROW_TEST_DATA or PARQUET_TEST_DATA.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-750041766


   @andygrove @alamb @jorgecarleitao
   
   I updated code and PR: drop `git`, add env `CARGO_MANIFEST_DIR`. Good news: is should be better than the pure env solution and the codes get cleaner than ever.
   
   **Coverage (amd64, stable)** fails at `test encodings::encoding::tests::test_bool ... FAILED`, but I can't re-produce this failure on local macOS with `cargo +stable test`. I think this failure is not introduced in this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r546324360



##########
File path: rust/arrow/src/ipc/reader.rs
##########
@@ -939,12 +939,13 @@ mod tests {
     use flate2::read::GzDecoder;
 
     use crate::util::integration_util::*;
-    use std::env;
     use std::fs::File;
 
+    use crate::util::test_data_dir::ARROW_TEST_DATA;
+
     #[test]
     fn read_generated_files() {
-        let testdata = env::var("ARROW_TEST_DATA").expect("ARROW_TEST_DATA not defined");

Review comment:
       @andygrove Make sense, let me refined the errors.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #8967:
URL: https://github.com/apache/arrow/pull/8967


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (9574571) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **decrease** coverage by `0.50%`.
   > The diff coverage is `82.05%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.20%   82.69%   -0.51%     
   ==========================================
     Files         199      201       +2     
     Lines       48857    49639     +782     
   ==========================================
   + Hits        40651    41050     +399     
   - Misses       8206     8589     +383     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `82.05% <82.05%> (ø)` | |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.49% <0.00%> (-1.51%)` | :arrow_down: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `91.90% <0.00%> (-1.11%)` | :arrow_down: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `82.86% <0.00%> (-0.84%)` | :arrow_down: |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `90.62% <0.00%> (-0.70%)` | :arrow_down: |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `89.42% <0.00%> (-0.57%)` | :arrow_down: |
   | [rust/arrow-flight/src/utils.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy1mbGlnaHQvc3JjL3V0aWxzLnJz) | `0.00% <0.00%> (ø)` | |
   | [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `0.00% <0.00%> (ø)` | |
   | ... and [17 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...9574571](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748636219


   @andygrove finally I force pushed just the newly added mod. PR desc was updated too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749648137


   > I wonder if we can use a Cargo environment variable rather than calling out to git to find the appropriate paths. For example, I think env!("CARGO_MANIFEST_DIR") is the directory containing Cargo.toml.
   
   @alamb I [The Cargo Book :: Environment Variables](https://doc.rust-lang.org/cargo/reference/environment-variables.html) states that:
   ```
   CARGO_MANIFEST_DIR — The directory containing the manifest for **the package** being built (the package containing the build script). Also note that this is the value of the current working directory of the build script when it starts.
   ```
   I've verified with unit test in the "arrow" package, that `env!("CARGO_MANIFEST_DIR")` equals to `env::current_dir().unwrap().display().to_string()` -- the dir "rust/arrow".
   
   Actually, the most simplest implementation to get "git toplevel dir" is to append "../../" to current dir at present.
   
   I did tried various ways to get the `git top level dir`: 
   
   - find ".git" file or dir from parents of current dir
   - find  dir "rust" from parents of current dir
   - use `git` command 
   
   I prefer that `arrow_test_data()` and `parquet_test_dir()` SHOULD NOT depends on the actual dir structure: workspace dir "rust/" or any existing package dir ("parquet/" for example), or possible deeper nesting dirs. The major reason is: these data dirs are **defined (controlled)** by `git submodule` for all sub projects,  it's none of the business of underlying dir structures in "apache arrow".
   
   Thank you for your review and comments!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749652377


   > as long as there is a reasonable error message that explains what environment variables to set when that fails (e.g. git isn't installed or something)
   
   Good point, let me update the error message.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749651061


   > I would personally be happy to accept this PR (based on calling git or something else) if simply running cargo test would pass all tests without having to set the *_TEST_DATA variables ( I think you would just need to make the changes you suggested of "Existing codes can be updated in this way :")
   
   Actually, in previous commits (that was reverted), I had updated other codes, but sometimes this PR fails in CI tests, caused by code merge, typical error is about unknown `env`: `std::env` was delete by this PR.
   
   @alamb If I understood correctly, are you suggesting me update existing codes that call `env::("*_TEST_DIR")` ?
   If so, I'll commit files (including to README.md) for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r547916595



##########
File path: rust/arrow/src/util/test_util.rs
##########
@@ -60,3 +60,134 @@ pub fn get_temp_file(file_name: &str, content: &[u8]) -> fs::File {
     assert!(file.is_ok());
     file.unwrap()
 }
+
+/// Gets arrow test data dir, by optional env `ARROW_TEST_DATA` or the default
+/// `../../testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::arrow_test_data();
+/// let csvdata = format!("{}/csv/aggregate_test_100.csv", testdata);
+/// assert!(std::path::PathBuf::from(csvdata).exists());
+/// ```
+pub fn arrow_test_data() -> String {
+    match get_data_dir("ARROW_TEST_DATA", "../../testing/data") {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get arrow data dir: {}", err)),
+    }
+}
+
+/// Gets parquet test data dir, by optional env `PARQUET_TEST_DATA` or the default
+/// `../../cpp/submodules/parquet-testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::parquet_test_data();
+/// let filename = format!("{}/binary.parquet", testdata);
+/// assert!(std::path::PathBuf::from(filename).exists());
+/// ```
+pub fn parquet_test_data() -> String {
+    match get_data_dir(
+        "PARQUET_TEST_DATA",
+        "../../cpp/submodules/parquet-testing/data",
+    ) {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get parquet data dir: {}", err)),
+    }
+}
+
+/// get_data_dir is the helper function for `arrow_test_data` and `arrow_test_data`.
+fn get_data_dir(udf_env: &str, submodule_data: &str) -> Result<PathBuf, Box<dyn Error>> {
+    // Try user defined env.
+    if let Ok(dir) = env::var(udf_env) {
+        let trimmed = dir.trim().to_string();
+        if !trimmed.is_empty() {
+            let pb = PathBuf::from(trimmed);
+            if pb.is_dir() {
+                return Ok(pb);
+            } else {
+                return Err(format!(
+                    "the data dir `{}` defined by env {} not found",
+                    pb.display().to_string(),
+                    udf_env
+                )
+                .into());
+            }
+        }
+    }
+
+    // The env is undefined or it's value is trimmed to empty, let's try default dir.
+
+    // env "CARGO_MANIFEST_DIR" is "the directory containing the manifest of your package",
+    // set by `cargo run` or `cargo test`, see:
+    // https://doc.rust-lang.org/cargo/reference/environment-variables.html
+    let dir = env!("CARGO_MANIFEST_DIR");
+
+    let pb = PathBuf::from(dir).join(submodule_data);
+    if pb.is_dir() {
+        Ok(pb)
+    } else {
+        Err(format!(
+            "env `{}` is undefined or has empty value, and the pre-defined data dir `{}` not found", 
+            udf_env,
+            pb.display().to_string(),
+        ).into())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::env;
+
+    #[test]
+    fn test_data_dir() {
+        let udf_env = "get_data_dir";
+        let cwd = env::current_dir().unwrap();
+
+        let existing_pb = cwd.join("..");
+        let existing = existing_pb.display().to_string();
+        let existing_str = existing.as_str();
+
+        let non_existing = cwd.join("non-existing-dir").display().to_string();
+        let non_existing_str = non_existing.as_str();
+
+        env::set_var(udf_env, non_existing_str);
+        let res = get_data_dir(udf_env, existing_str);
+        debug_assert!(res.is_err());

Review comment:
       I would personally recommend using `assert!` here and below rather than debug assert -- but it probably makes no practical difference since we would run these tests only on debug builds.
   
   ```suggestion
           debug!(res.is_err());
   ```

##########
File path: rust/arrow/src/util/test_util.rs
##########
@@ -60,3 +60,134 @@ pub fn get_temp_file(file_name: &str, content: &[u8]) -> fs::File {
     assert!(file.is_ok());
     file.unwrap()
 }
+
+/// Gets arrow test data dir, by optional env `ARROW_TEST_DATA` or the default
+/// `../../testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::arrow_test_data();
+/// let csvdata = format!("{}/csv/aggregate_test_100.csv", testdata);
+/// assert!(std::path::PathBuf::from(csvdata).exists());
+/// ```
+pub fn arrow_test_data() -> String {
+    match get_data_dir("ARROW_TEST_DATA", "../../testing/data") {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get arrow data dir: {}", err)),
+    }
+}
+
+/// Gets parquet test data dir, by optional env `PARQUET_TEST_DATA` or the default
+/// `../../cpp/submodules/parquet-testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::parquet_test_data();
+/// let filename = format!("{}/binary.parquet", testdata);
+/// assert!(std::path::PathBuf::from(filename).exists());
+/// ```
+pub fn parquet_test_data() -> String {
+    match get_data_dir(
+        "PARQUET_TEST_DATA",
+        "../../cpp/submodules/parquet-testing/data",
+    ) {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get parquet data dir: {}", err)),
+    }
+}
+
+/// get_data_dir is the helper function for `arrow_test_data` and `arrow_test_data`.
+fn get_data_dir(udf_env: &str, submodule_data: &str) -> Result<PathBuf, Box<dyn Error>> {

Review comment:
       ```suggestion
   /// Returns a directory path for finding test data. 
   ///
   /// udf_env: name of an environment variable
   ///
   /// submodule_dir: fallback path (relative to CARGO_MANIFEST_DIR)
   /// 
   ///  Returns either:
   /// The path referred to in `udf_env` if that variable is set and refers to a directory
   /// The submodule_data directory relative to CARGO_MANIFEST_PATH (i.e.`$CARGO_MANIFEST_DIR/$submodule_data`)
   fn get_data_dir(udf_env: &str, submodule_data: &str) -> Result<PathBuf, Box<dyn Error>> {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (1365c35) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **decrease** coverage by `0.55%`.
   > The diff coverage is `84.84%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.20%   82.65%   -0.56%     
   ==========================================
     Files         199      200       +1     
     Lines       48857    49851     +994     
   ==========================================
   + Hits        40651    41203     +552     
   - Misses       8206     8648     +442     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_util.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X3V0aWwucnM=) | `83.55% <84.84%> (+8.55%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `81.49% <0.00%> (-1.60%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.49% <0.00%> (-1.51%)` | :arrow_down: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `91.90% <0.00%> (-1.11%)` | :arrow_down: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `82.86% <0.00%> (-0.84%)` | :arrow_down: |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `90.62% <0.00%> (-0.70%)` | :arrow_down: |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `89.42% <0.00%> (-0.57%)` | :arrow_down: |
   | ... and [23 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...65eede3](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748459379


   https://issues.apache.org/jira/browse/ARROW-10967


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-750041766


   @andygrove @alamb @jorgecarleitao
   
   I updated code and PR: drop `git`, add env `CARGO_MANIFEST_DIR`. Good news: it should be better than the pure env solution and the codes get cleaner than ever.
   
   **Coverage (amd64, stable)** fails at `test encodings::encoding::tests::test_bool ... FAILED`, but I can't re-produce this failure on local macOS with `cargo +stable test`. I think this failure is not introduced in this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r547078506



##########
File path: rust/arrow/src/util/test_data_dir.rs
##########
@@ -0,0 +1,258 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Utils for getting "external" test data dirs for arrow and parquet, to make testing,
+//! benchmark and example easier.
+
+use std::{env, error::Error, path::PathBuf, process::Command};
+
+const ARROW_ENV: &str = "ARROW_TEST_DATA";
+const ARROW_SUBMODULE: &str = "testing";
+const ARROW_SUBMODULE_DATA: &str = "data";
+
+const PARQUET_ENV: &str = "PARQUET_TEST_DATA";
+const PARQUET_SUBMODULE: &str = "cpp/submodules/parquet-testing";
+const PARQUET_SUBMODULE_DATA: &str = "data";
+
+/// Gets arrow test data dir. Panic on error.
+#[allow(non_snake_case)]
+pub fn ARROW_TEST_DATA() -> String {

Review comment:
       nit: I would not have used upper case. The fact that the env variable is upper case is irrelevant to the function name. If we plan to have all tests use this, then the user (writing tests) would not even understand why the function is upper case as the env variable becomes opaque to them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (bf26986) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **decrease** coverage by `0.05%`.
   > The diff coverage is `85.82%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.20%   83.14%   -0.06%     
   ==========================================
     Files         199      201       +2     
     Lines       48857    49178     +321     
   ==========================================
   + Hits        40651    40889     +238     
   - Misses       8206     8289      +83     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `85.82% <85.82%> (ø)` | |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.49% <0.00%> (-1.51%)` | :arrow_down: |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.05% <0.00%> (-0.27%)` | :arrow_down: |
   | [rust/parquet/src/util/cursor.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL2N1cnNvci5ycw==) | `83.18% <0.00%> (ø)` | |
   | [rust/parquet/src/util/memory.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL21lbW9yeS5ycw==) | `89.57% <0.00%> (ø)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `76.76% <0.00%> (ø)` | |
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `88.12% <0.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <0.00%> (ø)` | |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `59.09% <0.00%> (ø)` | |
   | ... and [8 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...e9842ae](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748492695


   Thanks @mqy I like this approach.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-750180341


   
   > **Coverage (amd64, stable)** fails at `test encodings::encoding::tests::test_bool ... FAILED`, but I can't re-produce this failure on local macOS with `cargo +stable test`. I think this failure is not introduced in this PR.
   
   I think this is https://issues.apache.org/jira/browse/ARROW-10943 -- it is intermittently failing. I will retrigger the tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8967:
URL: https://github.com/apache/arrow/pull/8967


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749651061


   > I would personally be happy to accept this PR (based on calling git or something else) if simply running cargo test would pass all tests without having to set the *_TEST_DATA variables ( I think you would just need to make the changes you suggested of "Existing codes can be updated in this way :")
   
   @alamb If I understood correctly, are suggesting me update existing codes that call `env::("*_TEST_DIR")` ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r547163664



##########
File path: rust/arrow/src/util/test_data_dir.rs
##########
@@ -0,0 +1,258 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Utils for getting "external" test data dirs for arrow and parquet, to make testing,
+//! benchmark and example easier.
+
+use std::{env, error::Error, path::PathBuf, process::Command};
+
+const ARROW_ENV: &str = "ARROW_TEST_DATA";
+const ARROW_SUBMODULE: &str = "testing";
+const ARROW_SUBMODULE_DATA: &str = "data";
+
+const PARQUET_ENV: &str = "PARQUET_TEST_DATA";
+const PARQUET_SUBMODULE: &str = "cpp/submodules/parquet-testing";
+const PARQUET_SUBMODULE_DATA: &str = "data";
+
+/// Gets arrow test data dir. Panic on error.
+#[allow(non_snake_case)]
+pub fn ARROW_TEST_DATA() -> String {

Review comment:
       Thanks @jorgecarleitao Good point!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749795438


   One observation about the use of git is that it won't help when we are validating a release candidate based on the source tarball.
   
   I think it would be better if we just relied on cargo env vars if that is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (1242059) into [master](https://codecov.io/gh/apache/arrow/commit/0519c4c0ecccd7d84ce44bd3a3e7bcb4fef8f4d6?el=desc) (0519c4c) will **increase** coverage by `0.01%`.
   > The diff coverage is `96.49%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   + Coverage   82.64%   82.65%   +0.01%     
   ==========================================
     Files         200      200              
     Lines       49730    49787      +57     
   ==========================================
   + Hits        41098    41153      +55     
   - Misses       8632     8634       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_util.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X3V0aWwucnM=) | `90.90% <96.49%> (+15.90%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [9bab12f...1242059](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749648137


   > I wonder if we can use a Cargo environment variable rather than calling out to git to find the appropriate paths. For example, I think env!("CARGO_MANIFEST_DIR") is the directory containing Cargo.toml.
   
   @alamb I [The Cargo Book :: Environment Variables](https://doc.rust-lang.org/cargo/reference/environment-variables.html) states that:
   ```
   CARGO_MANIFEST_DIR — The directory containing the manifest for **the package** being built (the package containing the build script). Also note that this is the value of the current working directory of the build script when it starts.
   ```
   I've verified with as unit test in the "arrow" package, that `env!("CARGO_MANIFEST_DIR")` equals `env::current_dir().unwrap().display().to_string()` -- the dir "rust/arrow".
   
   Actually, the most simplest implementation to get "git toplevel dir" is to append "../../" to current dir at present.
   
   I did tried various ways to get the `git top level dir`: 
   
   - find ".git" file or dir from parents of current dir
   - find  dir "rust" from parents of current dir
   - use `git` command 
   
   I prefer that `arrow_test_data()` and `parquet_test_dir()` SHOULD NOT depends on the actual dir structure: workspace dir "rust/" or any existing package dir ("parquet/" for example), or possible deeper nesting dirs. The major reason is: these data dirs are **defined (controlled)** by `git submodule` for all sub projects,  it's none of the business of underlying dir structures in "apache arrow".
   
   Thank you for your review and comments!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748636219


   @andygrove finally I force pushed just the newly add mod. PR desc was updated too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749553813


   > panic message: `"ARROW_TEST_DATA not defined: NotPresent"`
   
   @alamb thanks for review.
   
   I guess if you had `unset ARROW_TEST_DATA`,  `env::var("ARROW_TEST_DATA").expect("ARROW_TEST_DATA not defined");` should panic with that error. 
   
   I'll double check this.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (e632746) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **increase** coverage by `0.03%`.
   > The diff coverage is `95.45%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   + Coverage   83.20%   83.23%   +0.03%     
   ==========================================
     Files         199      200       +1     
     Lines       48857    48957     +100     
   ==========================================
   + Hits        40651    40751     +100     
     Misses       8206     8206              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `75.00% <ø> (ø)` | |
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `95.19% <95.19%> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `83.70% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `95.62% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `96.26% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `79.78% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <100.00%> (ø)` | |
   | ... and [5 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...e632746](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r547947552



##########
File path: rust/arrow/src/util/test_util.rs
##########
@@ -60,3 +60,134 @@ pub fn get_temp_file(file_name: &str, content: &[u8]) -> fs::File {
     assert!(file.is_ok());
     file.unwrap()
 }
+
+/// Gets arrow test data dir, by optional env `ARROW_TEST_DATA` or the default
+/// `../../testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::arrow_test_data();
+/// let csvdata = format!("{}/csv/aggregate_test_100.csv", testdata);
+/// assert!(std::path::PathBuf::from(csvdata).exists());
+/// ```
+pub fn arrow_test_data() -> String {
+    match get_data_dir("ARROW_TEST_DATA", "../../testing/data") {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get arrow data dir: {}", err)),
+    }
+}
+
+/// Gets parquet test data dir, by optional env `PARQUET_TEST_DATA` or the default
+/// `../../cpp/submodules/parquet-testing/data`.
+/// It panics when failed to get dir.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::parquet_test_data();
+/// let filename = format!("{}/binary.parquet", testdata);
+/// assert!(std::path::PathBuf::from(filename).exists());
+/// ```
+pub fn parquet_test_data() -> String {
+    match get_data_dir(
+        "PARQUET_TEST_DATA",
+        "../../cpp/submodules/parquet-testing/data",
+    ) {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get parquet data dir: {}", err)),
+    }
+}
+
+/// get_data_dir is the helper function for `arrow_test_data` and `arrow_test_data`.
+fn get_data_dir(udf_env: &str, submodule_data: &str) -> Result<PathBuf, Box<dyn Error>> {
+    // Try user defined env.
+    if let Ok(dir) = env::var(udf_env) {
+        let trimmed = dir.trim().to_string();
+        if !trimmed.is_empty() {
+            let pb = PathBuf::from(trimmed);
+            if pb.is_dir() {
+                return Ok(pb);
+            } else {
+                return Err(format!(
+                    "the data dir `{}` defined by env {} not found",
+                    pb.display().to_string(),
+                    udf_env
+                )
+                .into());
+            }
+        }
+    }
+
+    // The env is undefined or it's value is trimmed to empty, let's try default dir.
+
+    // env "CARGO_MANIFEST_DIR" is "the directory containing the manifest of your package",
+    // set by `cargo run` or `cargo test`, see:
+    // https://doc.rust-lang.org/cargo/reference/environment-variables.html
+    let dir = env!("CARGO_MANIFEST_DIR");
+
+    let pb = PathBuf::from(dir).join(submodule_data);
+    if pb.is_dir() {
+        Ok(pb)
+    } else {
+        Err(format!(
+            "env `{}` is undefined or has empty value, and the pre-defined data dir `{}` not found", 
+            udf_env,
+            pb.display().to_string(),
+        ).into())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::env;
+
+    #[test]
+    fn test_data_dir() {
+        let udf_env = "get_data_dir";
+        let cwd = env::current_dir().unwrap();
+
+        let existing_pb = cwd.join("..");
+        let existing = existing_pb.display().to_string();
+        let existing_str = existing.as_str();
+
+        let non_existing = cwd.join("non-existing-dir").display().to_string();
+        let non_existing_str = non_existing.as_str();
+
+        env::set_var(udf_env, non_existing_str);
+        let res = get_data_dir(udf_env, existing_str);
+        debug_assert!(res.is_err());

Review comment:
       Updated again: replaced `debug_assert` with `assert`.
   
   @alamb that's great, I appreciate all the kindness, guidance!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749648137


   > I wonder if we can use a Cargo environment variable rather than calling out to git to find the appropriate paths. For example, I think env!("CARGO_MANIFEST_DIR") is the directory containing Cargo.toml.
   
   @alamb I [The Cargo Book :: Environment Variables](https://doc.rust-lang.org/cargo/reference/environment-variables.html) states that:
   ```
   CARGO_MANIFEST_DIR — The directory containing the manifest for **the package** being built (the package containing the build script). Also note that this is the value of the current working directory of the build script when it starts.
   ```
   I've verified with unit test in the "arrow" package, that `env!("CARGO_MANIFEST_DIR")` equals to `env::current_dir().unwrap().display().to_string()` -- the dir "rust/arrow".
   
   Actually, the most simplest implementation to get "git toplevel dir" is to append "../../" to current dir at present.
   
   I have tried various ways to get the `git top level dir`: 
   
   - find ".git" file or dir from parents of current dir
   - find  dir "rust" from parents of current dir
   - use `git` command -- this is current solution
   
   I prefer that `arrow_test_data()` and `parquet_test_dir()` SHOULD NOT depends on the actual dir structure: workspace dir "rust/" or any existing package dir ("parquet/" for example), or possible deeper nesting dirs. The major reason is: these data dirs are **defined (controlled)** by `git submodule` for all sub projects,  it's none of the business of underlying dir structures in "apache arrow".
   
   Thank you for your review and comments!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r546254073



##########
File path: rust/arrow/src/ipc/reader.rs
##########
@@ -939,12 +939,13 @@ mod tests {
     use flate2::read::GzDecoder;
 
     use crate::util::integration_util::*;
-    use std::env;
     use std::fs::File;
 
+    use crate::util::test_data_dir::ARROW_TEST_DATA;
+
     #[test]
     fn read_generated_files() {
-        let testdata = env::var("ARROW_TEST_DATA").expect("ARROW_TEST_DATA not defined");

Review comment:
       Could you post example output for when an example of test is run and the test data cannot be found? I expected the unwrap does show the detailed error message but just want to be sure since we're reminving these explicit messages here.

##########
File path: rust/arrow/src/ipc/reader.rs
##########
@@ -939,12 +939,13 @@ mod tests {
     use flate2::read::GzDecoder;
 
     use crate::util::integration_util::*;
-    use std::env;
     use std::fs::File;
 
+    use crate::util::test_data_dir::ARROW_TEST_DATA;
+
     #[test]
     fn read_generated_files() {
-        let testdata = env::var("ARROW_TEST_DATA").expect("ARROW_TEST_DATA not defined");

Review comment:
       Could you post example output for when an example of test is run and the test data cannot be found? I expected the unwrap does show the detailed error message but just want to be sure since we're removing these explicit messages here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy closed pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
mqy closed pull request #8967:
URL: https://github.com/apache/arrow/pull/8967


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (3ebffce) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **increase** coverage by `0.03%`.
   > The diff coverage is `95.45%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   + Coverage   83.20%   83.23%   +0.03%     
   ==========================================
     Files         199      200       +1     
     Lines       48857    48957     +100     
   ==========================================
   + Hits        40651    40750      +99     
   - Misses       8206     8207       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `75.00% <ø> (ø)` | |
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `95.19% <95.19%> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `83.70% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `95.62% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `96.26% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `79.78% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <100.00%> (ø)` | |
   | ... and [6 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...3ebffce](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (ca83840) into [master](https://codecov.io/gh/apache/arrow/commit/519e9da4fc1698f686525f4226295f3680a3f3db?el=desc) (519e9da) will **decrease** coverage by `0.01%`.
   > The diff coverage is `95.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.26%   83.24%   -0.02%     
   ==========================================
     Files         196      197       +1     
     Lines       48192    48260      +68     
   ==========================================
   + Hits        40125    40173      +48     
   - Misses       8067     8087      +20     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `75.00% <ø> (ø)` | |
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `93.75% <93.75%> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `83.70% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `95.62% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `93.28% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `79.78% <100.00%> (+0.21%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.54% <100.00%> (ø)` | |
   | ... and [7 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [5819943...ca83840](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#discussion_r546324360



##########
File path: rust/arrow/src/ipc/reader.rs
##########
@@ -939,12 +939,13 @@ mod tests {
     use flate2::read::GzDecoder;
 
     use crate::util::integration_util::*;
-    use std::env;
     use std::fs::File;
 
+    use crate::util::test_data_dir::ARROW_TEST_DATA;
+
     #[test]
     fn read_generated_files() {
-        let testdata = env::var("ARROW_TEST_DATA").expect("ARROW_TEST_DATA not defined");

Review comment:
       @andygrove Make sense, let's me refined the errors.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy edited a comment on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749648137


   > I wonder if we can use a Cargo environment variable rather than calling out to git to find the appropriate paths. For example, I think env!("CARGO_MANIFEST_DIR") is the directory containing Cargo.toml.
   
   @alamb [The Cargo Book :: Environment Variables](https://doc.rust-lang.org/cargo/reference/environment-variables.html) states that:
   ```
   CARGO_MANIFEST_DIR — The directory containing the manifest for **the package** being built (the package containing the build script). Also note that this is the value of the current working directory of the build script when it starts.
   ```
   I've verified with unit test in the "arrow" package, that `env!("CARGO_MANIFEST_DIR")` equals to `env::current_dir().unwrap().display().to_string()` -- the dir "rust/arrow".
   
   Actually, the most simplest implementation to get "git top level dir" is to append "../../" to current dir at present.
   
   I have tried various ways to get the `git top level dir`: 
   
   - find ".git" file or dir from parents of current dir
   - find  dir "rust" from parents of current dir
   - use `git` command -- this is current solution
   
   I prefer that `arrow_test_data()` and `parquet_test_dir()` SHOULD NOT depend on the actual dir structure: workspace dir "rust/" or any existing package dir ("parquet/" for example), or possible deeper nesting dirs. The major reason is: these data dirs are **defined (controlled)** by `git submodule` for all sub projects,  it's none of the business of underlying dir structures in "apache arrow".
   
   Thank you for your review and comments!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (e1b30ef) into [master](https://codecov.io/gh/apache/arrow/commit/519e9da4fc1698f686525f4226295f3680a3f3db?el=desc) (519e9da) will **decrease** coverage by `0.01%`.
   > The diff coverage is `95.31%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.26%   83.24%   -0.02%     
   ==========================================
     Files         196      200       +4     
     Lines       48192    48808     +616     
   ==========================================
   + Hits        40125    40629     +504     
   - Misses       8067     8179     +112     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `75.00% <ø> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `95.62% <ø> (ø)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <ø> (-0.10%)` | :arrow_down: |
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `95.19% <95.19%> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `83.70% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `96.26% <100.00%> (+2.98%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `79.78% <100.00%> (+0.21%)` | :arrow_up: |
   | ... and [27 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [5819943...5fb330f](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Make env vars optional

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748636219


   Still working


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8967: Arrow-10967: [Rust] Make env vars ARROW_TEST_DATA and PARQUET_TEST_DATA optional

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748457504


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8967: ARROW-10967: [Rust] Add functions for test data to mod arrow::util::test_util

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-749443620


   > Could you describe (or better illustrate in the PR) how these functions are intended to be used? It is difficult to evaluate whether the API fits a use-case if they are not called anywhere.
   
   Thanks! New changes:
   
   - rename function names to lower case
   - moved the new codes to `test_util.rs`
   - fix and test
   - update PR title and content
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8967: ARROW-10967: [Rust] Add mod arrow::util::test_data_dir

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8967:
URL: https://github.com/apache/arrow/pull/8967#issuecomment-748460013


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=h1) Report
   > Merging [#8967](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=desc) (e608972) into [master](https://codecov.io/gh/apache/arrow/commit/a054c78813c68b99abc0df9db87ca1e530d324d7?el=desc) (a054c78) will **decrease** coverage by `0.50%`.
   > The diff coverage is `82.05%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8967/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8967      +/-   ##
   ==========================================
   - Coverage   83.20%   82.69%   -0.51%     
   ==========================================
     Files         199      201       +2     
     Lines       48857    49639     +782     
   ==========================================
   + Hits        40651    41050     +399     
   - Misses       8206     8589     +383     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_data\_dir.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X2RhdGFfZGlyLnJz) | `82.05% <82.05%> (ø)` | |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.49% <0.00%> (-1.51%)` | :arrow_down: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `91.90% <0.00%> (-1.11%)` | :arrow_down: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `82.86% <0.00%> (-0.84%)` | :arrow_down: |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `90.62% <0.00%> (-0.70%)` | :arrow_down: |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `89.42% <0.00%> (-0.57%)` | :arrow_down: |
   | [rust/arrow-flight/src/utils.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy1mbGlnaHQvc3JjL3V0aWxzLnJz) | `0.00% <0.00%> (ø)` | |
   | [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `0.00% <0.00%> (ø)` | |
   | ... and [17 more](https://codecov.io/gh/apache/arrow/pull/8967/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=footer). Last update [a054c78...9574571](https://codecov.io/gh/apache/arrow/pull/8967?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org