You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "meng qingyou (Jira)" <ji...@apache.org> on 2020/12/18 21:19:00 UTC

[jira] [Updated] (ARROW-10967) Make env vars ARROW_TEST_DATA and PARQUET_TEST_DATA optional

     [ https://issues.apache.org/jira/browse/ARROW-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

meng qingyou updated ARROW-10967:
---------------------------------
    Description: 
Facts/problems:
 # Two env vars
*ARROW_TEST_DATA* and *PARQUET_TEST_DATA* are required to be set, for running tests,  benchmarks, examples.
 # There are totally eighteen .rs files using these environment variables.
 # The major usage likes this: ```
 let testdata =
 std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not defined");```
 # Somebody tried to assembly the test data directories by appending relative dir to *current dir* of current running process, but that highly depends on the actual current dir (for example, rust/, rust/datafusion, etc.).

Here is my solution:

Suppose:
 # *current_dir* is *ALWAYS* inside the *git workspace dir*
 # We know an *data dir X relative to git workspace dir*

Get absolute dir of *X* == get absolute dir of *git workspace dir*.

Given *current dir* (in *git workspace dir*),we visit the dir and it's parents, check if ."git"  (file or dir)exists. The first dir that contains ".git" SHOULD be *git workspace dir*.

 

  was:
Facts/problems:
 # Two vars *c* and *PARQUET_TEST_DATA* are required by be set for running tests,  benchmarks, examples.
 # There are totally eighteen .rs files use these environment variables.
 # The major usage likes this: ```
let testdata =
std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not defined");```
 # Somebody tried to assembly the test data directories by appending relative dir to *current dir* of current running process, but highly depend on the actual current dir (for example, rust/, rust/datafusion, etc.).

Here is my solution:

Suppose:
 # *current_dir* is ALWAYS inside the *git workspace dir*
 # We know an *absolute dir X relative to git workspace dir*

Get absolute dir of X == get absolute dir *TOP* of *git workspace dir*.

Given *current dir* (in *git workspace dir*),we visit the dir and it's parents, check if ."git"  (file or dir)exists. The first dir that contains ".git" SHOULD be *git workspace dir*.

 


> Make env vars ARROW_TEST_DATA and PARQUET_TEST_DATA optional
> ------------------------------------------------------------
>
>                 Key: ARROW-10967
>                 URL: https://issues.apache.org/jira/browse/ARROW-10967
>             Project: Apache Arrow
>          Issue Type: Test
>            Reporter: meng qingyou
>            Priority: Minor
>
> Facts/problems:
>  # Two env vars
> *ARROW_TEST_DATA* and *PARQUET_TEST_DATA* are required to be set, for running tests,  benchmarks, examples.
>  # There are totally eighteen .rs files using these environment variables.
>  # The major usage likes this: ```
>  let testdata =
>  std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not defined");```
>  # Somebody tried to assembly the test data directories by appending relative dir to *current dir* of current running process, but that highly depends on the actual current dir (for example, rust/, rust/datafusion, etc.).
> Here is my solution:
> Suppose:
>  # *current_dir* is *ALWAYS* inside the *git workspace dir*
>  # We know an *data dir X relative to git workspace dir*
> Get absolute dir of *X* == get absolute dir of *git workspace dir*.
> Given *current dir* (in *git workspace dir*),we visit the dir and it's parents, check if ."git"  (file or dir)exists. The first dir that contains ".git" SHOULD be *git workspace dir*.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)