You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/30 20:19:06 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

jorgecarleitao opened a new pull request #8313:
URL: https://github.com/apache/arrow/pull/8313


   Updates the top level README with more details about the crates, their current functionality, and what they enable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#issuecomment-702809460


   @andygrove , sure! I am stuck in the last step of https://gitbox.apache.org/setup/
   
   > User not a member of the ASF GitHub organisation. Please make sure you are a part of the ASF Organisation on GitHub and have 2FA enabled. Visit id.apache.org and set your GitHub ID to be invited to the org.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#issuecomment-702506137


   Actually, I had to create both versions, as they won't really work for sub-projects now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8313:
URL: https://github.com/apache/arrow/pull/8313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#issuecomment-703164946


   @andygrove , I was now able to merge this. First commit 🎉 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#discussion_r498603249



##########
File path: rust/README.md
##########
@@ -51,11 +72,11 @@ This populates data in two git submodules:
 Create two new environment variables to point to these directories as follows:
 
 ```bash
-export PARQUET_TEST_DATA=/path/to/arrow/cpp/submodules/parquet-testing/data
-export ARROW_TEST_DATA=/path/to/arrow/testing/data/
+export PARQUET_TEST_DATA=../../cpp/submodules/parquet-testing/data

Review comment:
       You are right, good catch. I have fixed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#discussion_r498342317



##########
File path: rust/README.md
##########
@@ -21,17 +21,38 @@
 
 [![Coverage Status](https://codecov.io/gh/apache/arrow/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow?branch=master)
 
-## The Rust implementation of Arrow consists of the following crates
+Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust.
+
+This part of the Arrow project is divided in 4 main components:
 
 | Crate     | Description | Documentation |
 |-----------|-------------|---------------|
-|Arrow      | Core functionality (memory layout, array builders, low level computations) | [(README)](arrow/README.md) |
-|Parquet    | Parquet support | [(README)](parquet/README.md) |
-|DataFusion | In-memory query engine with SQL support | [(README)](datafusion/README.md) |
+|Arrow        | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md) |
+|Parquet      | Parquet support | [(README)](parquet/README.md) |
+|Arrow-flight | Arrow data between processes | [(README)](arrow-flight/README.md) |
+|DataFusion   | In-memory query engine with SQL support | [(README)](datafusion/README.md) |
+
+Independently, they support a vast array of functionality for in-memory computations.
+
+Together, they allow users to write an SQL query or a `DataFrame` (using `DataFusion` crate), run it against a parquet file (using `Parquet` crate) and have it evaluated in-memory using Arrow's columnar format (using the `Arrow` crate), and sent it over to another process (using `Arrow-flight` crate).
+
+Generally speaking, `Arrow`'s has most functionality to develop under the Arrow format, and `DataFusion` offers most operations typically found in SQL, with the notable execeptions of:

Review comment:
       ```suggestion
   Generally speaking, the `arrow` crate offers the  functionality to develop code that uses Arrow arrays, and `datafusion` offers most operations typically found in SQL, with the notable exceptions of:
   ```

##########
File path: rust/README.md
##########
@@ -21,17 +21,38 @@
 
 [![Coverage Status](https://codecov.io/gh/apache/arrow/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow?branch=master)
 
-## The Rust implementation of Arrow consists of the following crates
+Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust.
+
+This part of the Arrow project is divided in 4 main components:
 
 | Crate     | Description | Documentation |
 |-----------|-------------|---------------|
-|Arrow      | Core functionality (memory layout, array builders, low level computations) | [(README)](arrow/README.md) |
-|Parquet    | Parquet support | [(README)](parquet/README.md) |
-|DataFusion | In-memory query engine with SQL support | [(README)](datafusion/README.md) |
+|Arrow        | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md) |
+|Parquet      | Parquet support | [(README)](parquet/README.md) |
+|Arrow-flight | Arrow data between processes | [(README)](arrow-flight/README.md) |
+|DataFusion   | In-memory query engine with SQL support | [(README)](datafusion/README.md) |
+
+Independently, they support a vast array of functionality for in-memory computations.
+
+Together, they allow users to write an SQL query or a `DataFrame` (using `DataFusion` crate), run it against a parquet file (using `Parquet` crate) and have it evaluated in-memory using Arrow's columnar format (using the `Arrow` crate), and sent it over to another process (using `Arrow-flight` crate).

Review comment:
       ```suggestion
   Together, they allow users to write an SQL query or a `DataFrame` (using `datafusion` crate), run it against a parquet file (using `parquet` crate), evaluate it in-memory using Arrow's columnar format (using the `arrow` crate), and send to another process (using `arrow-flight` crate).
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#discussion_r498601108



##########
File path: rust/README.md
##########
@@ -51,11 +72,11 @@ This populates data in two git submodules:
 Create two new environment variables to point to these directories as follows:
 
 ```bash
-export PARQUET_TEST_DATA=/path/to/arrow/cpp/submodules/parquet-testing/data
-export ARROW_TEST_DATA=/path/to/arrow/testing/data/
+export PARQUET_TEST_DATA=../../cpp/submodules/parquet-testing/data

Review comment:
       Relative paths potentially won't work when running `cargo test` from within the crate directories themselves rather than from the workspace directory. Perhaps we could do this instead?
   
   ```
   export PARQUET_TEST_DATA=`pwd`/../../cpp/submodules/parquet-testing/data
   export ARROW_TEST_DATA=`pwd`/../../testing/data/
   ```
   
   I'm assuming this works both on Mac and Linux, at least.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#issuecomment-701622678


   https://issues.apache.org/jira/browse/ARROW-4927


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8313: ARROW-4927: [Rust] Update top level README to describe current functionality

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8313:
URL: https://github.com/apache/arrow/pull/8313#issuecomment-702805331


   @jorgecarleitao Now that you are a committer do you want to merge this one to get to know the process? I usually make sure I have the latest from the master branch and then run `./dev/merge_pr.py`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org