You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/23 12:33:40 UTC

[GitHub] [arrow] alamb opened a new pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

alamb opened a new pull request #8996:
URL: https://github.com/apache/arrow/pull/8996


   This PR is based on @mqy 's great work in: https://github.com/apache/arrow/pull/8967. 
   
   (If we want to take this PR, we can either merge it in to https://github.com/apache/arrow/pull/8967/files# directly or I can make a new independent PR when that is merged).
   
   The outcome is that developers can now simply run  `cargo test` in a typical checkout without having to mess with environment variables. I think this will lower the barrier to entry for people to contribute. 
   
   The changes are:
   1. Code from https://github.com/apache/arrow/pull/8967 to encode heuristics of where to check for test data
   1. Remove all references to ARROW_TEST_DATA and PARQUET_TEST_DATA and uses the test_util methods instead
   2. Update the comments / error messages in test_util
   
   ## Example Error Handling
   
   Error handling: here is what happens with a fresh checkout and no git modules checked out  and no environment variables set:
   
   ```
   cargo test -p arrow
   ---- ipc::reader::tests::read_decimal_be_file_should_panic stdout ----
   thread 'ipc::reader::tests::read_decimal_be_file_should_panic' panicked at 'failed to get arrow data dir: env `ARROW_TEST_DATA` is undefined or has empty value, and the pre-defined data dir `/private/tmp/arrow/rust/arrow/../../testing/data` not found
   HINT: try running `git submodule update --init`', arrow/src/util/test_util.rs:81:21
   ```
   
   Here is an example of what happens when `ARROW_TEST_DATA` is pointing somewhere non existent
   
   ```
   ARROW_TEST_DATA=blargh cargo test -p arrow
   ...
   --- ipc::reader::tests::read_decimal_be_file_should_panic stdout ----
   thread 'ipc::reader::tests::read_decimal_be_file_should_panic' panicked at 'failed to get arrow data dir: the data dir `blargh` defined by env ARROW_TEST_DATA not found', arrow/src/util/test_util.rs:81:21
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548518038



##########
File path: rust/arrow/src/util/test_util.rs
##########
@@ -60,3 +60,152 @@ pub fn get_temp_file(file_name: &str, content: &[u8]) -> fs::File {
     assert!(file.is_ok());
     file.unwrap()
 }
+
+/// Returns the arrow test data directory, which is by default stored
+/// in a git submodule rooted at `arrow/testing/data`.
+///
+/// The default can be overridden by the optional environment
+/// variable `ARROW_TEST_DATA`
+///
+/// panics when the directory can not be found.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::arrow_test_data();
+/// let csvdata = format!("{}/csv/aggregate_test_100.csv", testdata);
+/// assert!(std::path::PathBuf::from(csvdata).exists());
+/// ```
+pub fn arrow_test_data() -> String {
+    match get_data_dir("ARROW_TEST_DATA", "../../testing/data") {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get arrow data dir: {}", err)),
+    }
+}
+
+/// Returns the parquest test data directory, which is by default
+/// stored in a git submodule rooted at
+/// `arrow/cpp/submodules/parquest-testing/data`.
+///
+/// The default can be overridden by the optional environment variable
+/// `PARQUET_TEST_DATA`
+///
+/// panics when the directory can not be found.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::parquet_test_data();
+/// let filename = format!("{}/binary.parquet", testdata);
+/// assert!(std::path::PathBuf::from(filename).exists());
+/// ```
+pub fn parquet_test_data() -> String {
+    match get_data_dir(
+        "PARQUET_TEST_DATA",

Review comment:
       Good call -- I will move it to the parquet crate




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548537221



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       +1 for the first choice right now. Perhaps these functions should stay in a top crate parallel to arrow crate, as common utilities for testing or something else. We may crate that crate later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-750906670


   @alamb those two envs also stay in rust/arrow/README.md
   How about remove them from rust/arrow/README.md, or link user to to rust/README.md ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548519493



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       Will do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-750880297


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=h1) Report
   > Merging [#8996](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=desc) (195e743) into [master](https://codecov.io/gh/apache/arrow/commit/1ecef42bb9fb9e91f0fb04c7d5a1c3be58390025?el=desc) (1ecef42) will **increase** coverage by `0.00%`.
   > The diff coverage is `96.15%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8996/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #8996   +/-   ##
   =======================================
     Coverage   82.65%   82.66%           
   =======================================
     Files         200      200           
     Lines       49795    49790    -5     
   =======================================
     Hits        41159    41159           
   + Misses       8636     8631    -5     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_util.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X3V0aWwucnM=) | `90.90% <ø> (ø)` | |
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `81.25% <ø> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `82.86% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `96.92% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `96.26% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `80.46% <100.00%> (+0.15%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <100.00%> (ø)` | |
   | ... and [4 more](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=footer). Last update [ca685a0...195e743](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548092727



##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       Will have a look. I saw it being changed during a PR indeed, thought it had to do with a recent rust upgrade. Seems that the CI accepts both alternatives, so I guess it's a newer version doing more formatting.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r547997352



##########
File path: rust/arrow/src/util/test_util.rs
##########
@@ -60,3 +60,152 @@ pub fn get_temp_file(file_name: &str, content: &[u8]) -> fs::File {
     assert!(file.is_ok());
     file.unwrap()
 }
+
+/// Returns the arrow test data directory, which is by default stored
+/// in a git submodule rooted at `arrow/testing/data`.
+///
+/// The default can be overridden by the optional environment
+/// variable `ARROW_TEST_DATA`
+///
+/// panics when the directory can not be found.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::arrow_test_data();
+/// let csvdata = format!("{}/csv/aggregate_test_100.csv", testdata);
+/// assert!(std::path::PathBuf::from(csvdata).exists());
+/// ```
+pub fn arrow_test_data() -> String {
+    match get_data_dir("ARROW_TEST_DATA", "../../testing/data") {
+        Ok(pb) => pb.display().to_string(),
+        Err(err) => panic!(format!("failed to get arrow data dir: {}", err)),
+    }
+}
+
+/// Returns the parquest test data directory, which is by default
+/// stored in a git submodule rooted at
+/// `arrow/cpp/submodules/parquest-testing/data`.
+///
+/// The default can be overridden by the optional environment variable
+/// `PARQUET_TEST_DATA`
+///
+/// panics when the directory can not be found.
+///
+/// Example:
+/// ```
+/// let testdata = arrow::util::test_util::parquet_test_data();
+/// let filename = format!("{}/binary.parquet", testdata);
+/// assert!(std::path::PathBuf::from(filename).exists());
+/// ```
+pub fn parquet_test_data() -> String {
+    match get_data_dir(
+        "PARQUET_TEST_DATA",

Review comment:
       I don't expect us to use this in the `arrow` crate

##########
File path: rust/README.md
##########
@@ -92,9 +92,12 @@ This populates data in two git submodules:
 - `../cpp/submodules/parquet_testing/data` (sourced from https://github.com/apache/parquet-testing.git)
 - `../testing` (sourced from https://github.com/apache/arrow-testing)
 
-The following Env vars are required to run `cargo test`, examples, etc.
+By default, `cargo test` will look for these directories at their
+standard location. The following Env vars can be used to override the
+location shoud you choose

Review comment:
       ```suggestion
   location should you choose.
   ```

##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       There might be a `rustfmt` version mismatch that's going to have us changing this back and forth. I've also seen rustfmt suggest changing this, but I've been ignoring it so far. What version of `rustfmt` are you using @Dandandan ?

##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       Yea, better to move the parquet test data fn into this crate, because `arrow` is an optional dependency




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-751352790


   @nevi-me  -- I have not been able to find a satisfactory way to avoid using the common `parquet_test_data` function from the `arrow` crate in the parquet crate. Since they are only used in tests, I think this is ok -- however, if you would like a different organization please let me know and I can do it as a follow on PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548066337



##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       Me too




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548059740



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       Agree with @nevi-me
   
   Actually I had asked similar question "where to put them?", finally I decided leaving this question to reviewers, because the most important thing at that time is coding.
   
   At present, with this PR, thanks to @alamb, we finally got a chance to discuss this issue. Actually I don't mind how to improve or split the code in #8967 at all, as long as the fundamental purpose could be achieved, and they do not mess up the existing testing code  too much :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-750288071


   https://issues.apache.org/jira/browse/ARROW-10967


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548098965



##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       Looks like I probably was using a more recent version indeed and have it on the nightly version by default:
   `rustfmt 1.4.29-nightly (70ce182 2020-12-04)`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-751487244


   Hey @alamb, it's fine, we can merge the changes as they are. Thanks for trying to move the function.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548063504



##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       I've been seeing this locally too, but not all the time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8996: ARROW-10967 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548059740



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       Agree with @nevi-me
   
   Actually I had asked similar question "where to put them?", finally I decided leaving this question to reviewers, because the most important thing at that time is coding.
   
   At present, with this PR, thanks to @alamb, we finally got a chance to discuss this issue. Actually I don't mind how to improve or split the code in #8967 at all, as long as the fundamental purpose could be achieved, and they do not mess up the existing code  too much :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548523886



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       @nevi-me  and @mqy  -- I tried to move `parquet_test_data` into https://github.com/apache/arrow/blob/master/rust/parquet/src/util/test_common/file_util.rs -- however, the code quickly got messy because `parquet::util` is not publically exported and thus I can't use functions defined there in places (like datafusion) outside the parquet crate. Furthermore, the `test_utils` are only compiled in `test` config, but several datafusion examples use the parquet test data but they are not compiled in `test` config. 
   
   I can think of several possibilities:
   1. Leave the `parquet_test_data` function in the arrow crate as it is in this PR
   2. Make a copy of parquet_test_data in the parquet crate
   3. Make the parquet util module public and export test_util in all configurations
   
   Given that this function is used in tests and the other options seem messy to me, I suggest number 1 (though perhaps I am being lazy)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-750880297


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=h1) Report
   > Merging [#8996](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=desc) (84ceb2b) into [master](https://codecov.io/gh/apache/arrow/commit/8fa68b03a6d4fc6e0e76bfa46b3fe9aa77d9f002?el=desc) (8fa68b0) will **increase** coverage by `0.00%`.
   > The diff coverage is `96.15%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8996/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #8996   +/-   ##
   =======================================
     Coverage   82.87%   82.88%           
   =======================================
     Files         201      201           
     Lines       49729    49734    +5     
   =======================================
   + Hits        41213    41220    +7     
   + Misses       8516     8514    -2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/util/test\_util.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC90ZXN0X3V0aWwucnM=) | `90.90% <ø> (ø)` | |
   | [rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2Nzdi5ycw==) | `81.25% <ø> (ø)` | |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `82.86% <100.00%> (ø)` | |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `83.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/datasource/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL3BhcnF1ZXQucnM=) | `96.92% <100.00%> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `93.47% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Nzdi5ycw==) | `82.78% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=) | `80.46% <100.00%> (+0.15%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `77.46% <100.00%> (ø)` | |
   | ... and [5 more](https://codecov.io/gh/apache/arrow/pull/8996/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=footer). Last update [8fa68b0...84ceb2b](https://codecov.io/gh/apache/arrow/pull/8996?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548537221



##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
 
 /// Returns path to the test parquet file in 'data' directory
 pub fn get_test_path(file_name: &str) -> PathBuf {
-    let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
-        Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
-        Err(_) => {
-            let mut pathbuf = env::current_dir().unwrap();
-            pathbuf.pop();
-            pathbuf.pop();
-            pathbuf
-                .push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
-            pathbuf
-        }
-    };
+    let mut pathbuf =
+        PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();

Review comment:
       +1 for the first choice right now. Perhaps these functions should stay in a top crate parallel to arrow crate, as common utilities for testing or something else. We may create that crate later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-750874224


   https://issues.apache.org/jira/browse/ARROW-11026


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8996:
URL: https://github.com/apache/arrow/pull/8996


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548979463



##########
File path: rust/arrow-flight/src/arrow.flight.protocol.rs
##########
@@ -498,9 +498,8 @@ pub mod flight_service_server {
     #[async_trait]
     pub trait FlightService: Send + Sync + 'static {
         #[doc = "Server streaming response type for the Handshake method."]
-        type HandshakeStream: Stream<
-            Item = Result<super::HandshakeResponse, tonic::Status>,
-        > + Send
+        type HandshakeStream: Stream<Item = Result<super::HandshakeResponse, tonic::Status>>

Review comment:
       FWIW I removed the changes to `rust/arrow-flight/src/arrow.flight.protocol.rs` from this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8996: ARROW-11026 [Rust]: Run tests without requiring environment variables

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#issuecomment-751352578


   > How about remove them from rust/arrow/README.md, or link user to to rust/README.md ?
   
   Good idea @mqy  -- done. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org