You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/25 16:03:17 UTC

[GitHub] [arrow-datafusion] jorgecarleitao opened a new pull request #69: Add datafusion-python

jorgecarleitao opened a new pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69


   This is a PR with the source code of python-datafusion, currently available at https://github.com/jorgecarleitao/datafusion-python and released in pypi as [datafusion](https://pypi.org/project/datafusion/).
   
   The goal of this PR is to gauge interest of moving that code base closer to datafusion and to within ASF.
   
   Some notes:
   * The documentation is lacking, but I would hope to have the docs published somewhere via readthedocs.
   * The release builds wheels for windows, mac, and manylinux2010, which cover some of the user-base
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619840491



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       Also, I assume we need to licence under ASL 2.0 as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619840357



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       This presumably needs to change to an ASF copyright?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619840491



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       Also, I assume we need to license under ASL 2.0 as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-829523322


   Pushed the license and also hopefully fixed the CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350048


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#69](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3d9e1a3) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/2423ff0dd1fe9c0932c1cb8d1776efa3acd69554?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2423ff0) will **decrease** coverage by `0.66%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/69/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #69      +/-   ##
   ==========================================
   - Coverage   76.43%   75.77%   -0.67%     
   ==========================================
     Files         135      142       +7     
     Lines       23264    23467     +203     
   ==========================================
     Hits        17782    17782              
   - Misses       5482     5685     +203     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9jb250ZXh0LnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/errors.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9lcnJvcnMucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9zY2FsYXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/types.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy90eXBlcy5ycw==) | `0.00% <0.00%> (ø)` | |
   | [python/src/udaf.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy91ZGFmLnJz) | `0.00% <0.00%> (ø)` | |
   | ... and [4 more](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [2423ff0...3d9e1a3](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-832003516


   Yeah, my interpretation was that since @jorgecarleitao  authored this code, I was treating this as "just a normal PR" (it happens to have lived somewhere else for a while but from an IP provenance perspective it seemed no different to a normal PR to me).
   
   However, I am not an expert in such matters. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350827


   Thank you @jorgecarleitao I am really excited to see this and would love to see this merged into arrow-datafusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619844498



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       This would all change to Apache 2.0. It was just the license I had up to this point in jorgecaleitao. No need for MIT here.
   
   Note that I am the sole copyright holder of this code, so I can licensed it in any way I want to ASF. I will need to push a commit to change this to Apache 2.0, and add headers everywhere as per requirements, just wanted to get the ball rolling ^_^




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r620558916



##########
File path: .github/workflows/python_test.yaml
##########
@@ -0,0 +1,41 @@
+name: Tests
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2
+    - uses: actions-rs/toolchain@v1
+      with:
+        toolchain: nightly-2020-11-24
+        default: true
+        components: rustfmt
+    - name: Cache Cargo
+      uses: actions/cache@v2
+      with:
+        path: /home/runner/.cargo
+        key: cargo-maturin-cache-
+    - name: Cache Rust dependencies
+      uses: actions/cache@v2
+      with:
+        path: /home/runner/target
+        key: target-maturin-cache-
+    - uses: actions/setup-python@v2
+      with:
+        python-version: '3.7'
+    - name: Install Python dependencies
+      run: python -m pip install --upgrade pip setuptools wheel
+    - name: Run tests
+      run: |
+        cd python/
+        export CARGO_HOME="/home/runner/.cargo"
+        export CARGO_TARGET_DIR="/home/runner/target"
+
+        python -m venv venv
+        source venv/bin/activate
+
+        pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0

Review comment:
       shame 😄 it should work on a higher version, I just did not get the time to bump it. It uses the c data interface, which was already stable by v1.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831972645


   @wesm thank you. Not a nuisance at all, it is important to have this done correctly.
   
   The rational here:
   
   I hold the copyright over the whole code base, except for a 1 word typo fix on the README. The code was MIT licensed on jorgecarleitao/python-datafusion.
   
   As part of this PR, I pushed a commit that added the license headers to every file in the source code. As copyright holder, I thereby licensed all this code to ASF under the ICA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350048


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#69](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2455f63) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/2423ff0dd1fe9c0932c1cb8d1776efa3acd69554?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2423ff0) will **decrease** coverage by `0.66%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 2455f63 differs from pull request most recent head ab17019. Consider uploading reports for the commit ab17019 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/69/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #69      +/-   ##
   ==========================================
   - Coverage   76.43%   75.77%   -0.67%     
   ==========================================
     Files         135      142       +7     
     Lines       23264    23467     +203     
   ==========================================
     Hits        17782    17782              
   - Misses       5482     5685     +203     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9jb250ZXh0LnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/errors.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9lcnJvcnMucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9zY2FsYXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/types.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy90eXBlcy5ycw==) | `0.00% <0.00%> (ø)` | |
   | [python/src/udaf.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy91ZGFmLnJz) | `0.00% <0.00%> (ø)` | |
   | ... and [4 more](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [2423ff0...ab17019](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r620562060



##########
File path: .github/workflows/python_build.yml
##########
@@ -0,0 +1,72 @@
+name: Build

Review comment:
       Yeap, we will need to work out a packaging; the build of the wheels is imo still relevant, as it is not so easy in Rust (afai understood support for this is still a bit WIP). Building the manylinux was a feat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-829978863


   Ok, I have now fixed the CI, pushed the license headers, and bumped to latest datafusion.
   
   There was a regression, documented in #226.
   
   Once we fix the regression, this can be released in pypi as 0.2.2 since there were no backward incompatible changes on it 🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619840306



##########
File path: python/Cargo.toml
##########
@@ -0,0 +1,39 @@
+[package]
+name = "datafusion"
+version = "0.2.1"
+authors = ["Jorge C. Leitao <jo...@gmail.com>"]

Review comment:
       ```suggestion
   authors = ["Apache Arrow <de...@arrow.apache.org>"]
   ```

##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       This presumably needs to change to an ASF copyright?

##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       Also, I assume we need to licence under ASL 2.0 as well

##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       Also, I assume we need to license under ASL 2.0 as well

##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       I'm unsure whether we can use an MIT license in an ASF project or not and wasn't able to find answers right away on the ASF site but we can ask for advice on this from the incubator folks.
   
   What is the reason for the MIT license here? Is this something that is expected in the Python ecosystem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350048


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#69](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3d9e1a3) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/2423ff0dd1fe9c0932c1cb8d1776efa3acd69554?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2423ff0) will **decrease** coverage by `0.66%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/69/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #69      +/-   ##
   ==========================================
   - Coverage   76.43%   75.77%   -0.67%     
   ==========================================
     Files         135      142       +7     
     Lines       23264    23467     +203     
   ==========================================
     Hits        17782    17782              
   - Misses       5482     5685     +203     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9jb250ZXh0LnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/errors.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9lcnJvcnMucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9zY2FsYXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/types.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy90eXBlcy5ycw==) | `0.00% <0.00%> (ø)` | |
   | [python/src/udaf.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy91ZGFmLnJz) | `0.00% <0.00%> (ø)` | |
   | ... and [4 more](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [2423ff0...3d9e1a3](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wesm commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831989472


   Thanks, I'm not enough of an expert to know what is the correct protocol, a vote may not be needed at all but let's double check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] h-vetinari commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
h-vetinari commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r624515066



##########
File path: python/README.md
##########
@@ -0,0 +1,127 @@
+## DataFusion in Python
+
+This is a Python library that binds to [Apache Arrow](https://arrow.apache.org/) in-memory query engine [DataFusion](https://github.com/apache/arrow/tree/master/rust/datafusion).
+
+Like pyspark, it allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Python.
+
+It also allows you to use UDFs and UDAFs for complex operations.
+
+The major advantage of this library over other execution engines is that this library achieves zero-copy between Python and its execution engine: there is no cost in using UDFs, UDAFs, and collecting the results to Python apart from having to lock the GIL when running those operations.
+
+Its query engine, DataFusion, is written in [Rust](https://www.rust-lang.org/), which makes strong assumptions about thread safety and lack of memory leaks.
+
+Technically, zero-copy is achieved via the [c data interface](https://arrow.apache.org/docs/format/CDataInterface.html).
+
+## How to use it
+
+Simple usage:
+
+```python
+import datafusion
+import pyarrow
+
+# an alias
+f = datafusion.functions
+
+# create a context
+ctx = datafusion.ExecutionContext()
+
+# create a RecordBatch and a new DataFrame from it
+batch = pyarrow.RecordBatch.from_arrays(
+    [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
+    names=["a", "b"],
+)
+df = ctx.create_dataframe([[batch]])
+
+# create a new statement
+df = df.select(
+    f.col("a") + f.col("b"),
+    f.col("a") - f.col("b"),
+)
+
+# execute and collect the first (and only) batch
+result = df.collect()[0]
+
+assert result.column(0) == pyarrow.array([5, 7, 9])
+assert result.column(1) == pyarrow.array([-3, -3, -3])
+```
+
+### UDFs
+
+```python
+def is_null(array: pyarrow.Array) -> pyarrow.Array:
+    return array.is_null()
+
+udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_())
+
+df = df.select(udf(f.col("a")))
+```
+
+### UDAF
+
+```python
+import pyarrow
+import pyarrow.compute
+
+
+class Accumulator:
+    """
+    Interface of a user-defined accumulation.
+    """
+    def __init__(self):
+        self._sum = pyarrow.scalar(0.0)
+
+    def to_scalars(self) -> [pyarrow.Scalar]:
+        return [self._sum]
+
+    def update(self, values: pyarrow.Array) -> None:
+        # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+        self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
+
+    def merge(self, states: pyarrow.Array) -> None:
+        # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+        self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
+
+    def evaluate(self) -> pyarrow.Scalar:
+        return self._sum
+
+
+df = ...
+
+udaf = f.udaf(Accumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()])
+
+df = df.aggregate(
+    [],
+    [udaf(f.col("a"))]
+)
+```
+
+## How to install
+
+```bash
+pip install datafusion
+```

Review comment:
       @xhochy 
   If you want you can ping me on the staged-recipes PR, once you create it. I was just reading up on the state of arrow vs. rust, and was surprised that datafusion isn't yet in conda-forge. ;-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619841910



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       I'm unsure whether we can use an MIT license in an ASF project or not and wasn't able to find answers right away on the ASF site but we can ask for advice on this from the incubator folks.
   
   What is the reason for the MIT license here? Is this something that is expected in the Python ecosystem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831681861


   @andygrove please go ahead 🚀


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wesm commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831971257


   Probably best to check with general@incubator to determine the preferred protocol in this situation. I don't want to subject you to unneeded process, but would be good to go by the book


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove merged pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove merged pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619844498



##########
File path: python/LICENSE
##########
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Jorge Leitao

Review comment:
       This would all change to Apache 2.0. It was just the license I had up to this point in jorgecaleitao. No need for MIT here.
   
   Note that I am the sole copyright holder of this code, so I can licensed it in any way I want to ASF. I will need to push a commit to change this to Apache 2.0, and add headers everywhere as per requirements, just wanted to get the ball rolling ^_^




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r619840306



##########
File path: python/Cargo.toml
##########
@@ -0,0 +1,39 @@
+[package]
+name = "datafusion"
+version = "0.2.1"
+authors = ["Jorge C. Leitao <jo...@gmail.com>"]

Review comment:
       ```suggestion
   authors = ["Apache Arrow <de...@arrow.apache.org>"]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826356367


   Some notes:
   
   * this points to a rather old commit in DataFusion. We need to work on that
   * the CI did not run for some reason; I need to fix that
   * I need to push the Apache license and headers to flag that this is being licensed to ASF via Apache 2 and not MIT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831969631


   > I hate to be a nuisance, but didn't this need to go through IP clearance?
   
   We can revert if this is the case, but because Jorge was the only contributor (except for one contribution fixing a typo in a README) this didn't seem to be required in this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] wesm commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831967240


   I hate to be a nuisance, but didn't this need to go through IP clearance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350048


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#69](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5fc75a0) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/9ba214a52ed78c57d3d6363c61a88893d41fe906?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9ba214a) will **decrease** coverage by `0.54%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/69/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #69      +/-   ##
   ==========================================
   - Coverage   76.24%   75.70%   -0.55%     
   ==========================================
     Files         134      141       +7     
     Lines       23051    23216     +165     
   ==========================================
     Hits        17576    17576              
   - Misses       5475     5640     +165     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9jb250ZXh0LnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/errors.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9lcnJvcnMucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9zY2FsYXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/types.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy90eXBlcy5ycw==) | `0.00% <0.00%> (ø)` | |
   | [python/src/udaf.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy91ZGFmLnJz) | `0.00% <0.00%> (ø)` | |
   | [datafusion/src/physical\_plan/expressions/case.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9jYXNlLnJz) | `72.91% <0.00%> (-0.39%)` | :arrow_down: |
   | [benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmVuY2htYXJrcy9zcmMvYmluL3RwY2gucnM=) | `35.07% <0.00%> (-0.08%)` | :arrow_down: |
   | [ballista/rust/client/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jbGllbnQvc3JjL2NvbnRleHQucnM=) | `0.00% <0.00%> (ø)` | |
   | ... and [7 more](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9ba214a...5fc75a0](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831977116


   PR to revert: https://github.com/apache/arrow-datafusion/pull/257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826356367


   Some notes:
   
   * this points to a rather old commit in DataFusion. We need to work on that
   * the CI did not run for some reason; I need to fix that
   * I need to push the Apache license and headers to flag that this is being licensed to ASF via Apache 2 and not MIT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-831627866


   @alamb @Dandandan Any objection to merging this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r620560132



##########
File path: python/tests/test_df.py
##########
@@ -0,0 +1,98 @@
+import unittest

Review comment:
       it comes with python, so no need to install other stuff. But no feelings here; we can refactor this whole thing. =)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xhochy commented on a change in pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
xhochy commented on a change in pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#discussion_r620234294



##########
File path: .github/workflows/python_build.yml
##########
@@ -0,0 +1,72 @@
+name: Build

Review comment:
       The tag release probably won't work in the context of an ASF repo anymore?

##########
File path: python/README.md
##########
@@ -0,0 +1,127 @@
+## DataFusion in Python
+
+This is a Python library that binds to [Apache Arrow](https://arrow.apache.org/) in-memory query engine [DataFusion](https://github.com/apache/arrow/tree/master/rust/datafusion).
+
+Like pyspark, it allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Python.
+
+It also allows you to use UDFs and UDAFs for complex operations.
+
+The major advantage of this library over other execution engines is that this library achieves zero-copy between Python and its execution engine: there is no cost in using UDFs, UDAFs, and collecting the results to Python apart from having to lock the GIL when running those operations.
+
+Its query engine, DataFusion, is written in [Rust](https://www.rust-lang.org/), which makes strong assumptions about thread safety and lack of memory leaks.
+
+Technically, zero-copy is achieved via the [c data interface](https://arrow.apache.org/docs/format/CDataInterface.html).
+
+## How to use it
+
+Simple usage:
+
+```python
+import datafusion
+import pyarrow
+
+# an alias
+f = datafusion.functions
+
+# create a context
+ctx = datafusion.ExecutionContext()
+
+# create a RecordBatch and a new DataFrame from it
+batch = pyarrow.RecordBatch.from_arrays(
+    [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
+    names=["a", "b"],
+)
+df = ctx.create_dataframe([[batch]])
+
+# create a new statement
+df = df.select(
+    f.col("a") + f.col("b"),
+    f.col("a") - f.col("b"),
+)
+
+# execute and collect the first (and only) batch
+result = df.collect()[0]
+
+assert result.column(0) == pyarrow.array([5, 7, 9])
+assert result.column(1) == pyarrow.array([-3, -3, -3])
+```
+
+### UDFs
+
+```python
+def is_null(array: pyarrow.Array) -> pyarrow.Array:
+    return array.is_null()
+
+udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_())
+
+df = df.select(udf(f.col("a")))
+```
+
+### UDAF
+
+```python
+import pyarrow
+import pyarrow.compute
+
+
+class Accumulator:
+    """
+    Interface of a user-defined accumulation.
+    """
+    def __init__(self):
+        self._sum = pyarrow.scalar(0.0)
+
+    def to_scalars(self) -> [pyarrow.Scalar]:
+        return [self._sum]
+
+    def update(self, values: pyarrow.Array) -> None:
+        # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+        self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
+
+    def merge(self, states: pyarrow.Array) -> None:
+        # not nice since pyarrow scalars can't be summed yet. This breaks on `None`
+        self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
+
+    def evaluate(self) -> pyarrow.Scalar:
+        return self._sum
+
+
+df = ...
+
+udaf = f.udaf(Accumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()])
+
+df = df.aggregate(
+    [],
+    [udaf(f.col("a"))]
+)
+```
+
+## How to install
+
+```bash
+pip install datafusion
+```

Review comment:
       Adding here as a suggestion but I'll take a look at packaging it as a conda package. I'll cc you on the PR once I got a bit working.
   
   ```suggestion
   ```
   
   or via `conda`/`mamba`:
   
   ```
   conda install -c conda-forge datafusion
   mamba install -c conda-forge datafusion
   ```

##########
File path: python/tests/test_df.py
##########
@@ -0,0 +1,98 @@
+import unittest

Review comment:
       Out of curiosity: Why not `pytest`? 

##########
File path: python/Cargo.toml
##########
@@ -0,0 +1,39 @@
+[package]
+name = "datafusion"
+version = "0.2.1"
+authors = ["Jorge C. Leitao <jo...@gmail.com>"]
+description = "Build and run queries against data"
+readme = "README.md"
+repository = "https://github.com/jorgecarleitao/datafusion-python"
+license = "MIT OR Apache-2.0"
+edition = "2018"
+
+[dependencies]
+tokio = "0.2.22"
+rand = "0.7"
+pyo3 = { version = "0.12.1", features = ["extension-module"] }
+datafusion = { git = "https://github.com/apache/arrow.git", rev = "f945eba", features = ["simd"] }
+arrow = { git = "https://github.com/apache/arrow.git", rev = "f945eba", features = ["simd"] }
+
+[lib]
+name = "datafusion"
+crate-type = ["cdylib"]
+
+[package.metadata.maturin]
+requires-dist = ["pyarrow>=1"]
+
+classifier = [
+    "Development Status :: 2 - Pre-Alpha",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: Apache Software License",
+    "License :: OSI Approved",
+    "Operating System :: MacOS",
+    "Operating System :: Microsoft :: Windows",
+    "Operating System :: POSIX :: Linux",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.6",
+    "Programming Language :: Python :: 3.7",
+    "Programming Language :: Python :: 3.8",

Review comment:
       Everything listed here should also work with Python 3.9
   ```suggestion
       "Programming Language :: Python :: 3.8",
       "Programming Language :: Python :: 3.9",
   ```

##########
File path: .github/workflows/python_test.yaml
##########
@@ -0,0 +1,41 @@
+name: Tests
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2
+    - uses: actions-rs/toolchain@v1
+      with:
+        toolchain: nightly-2020-11-24
+        default: true
+        components: rustfmt
+    - name: Cache Cargo
+      uses: actions/cache@v2
+      with:
+        path: /home/runner/.cargo
+        key: cargo-maturin-cache-
+    - name: Cache Rust dependencies
+      uses: actions/cache@v2
+      with:
+        path: /home/runner/target
+        key: target-maturin-cache-
+    - uses: actions/setup-python@v2
+      with:
+        python-version: '3.7'
+    - name: Install Python dependencies
+      run: python -m pip install --upgrade pip setuptools wheel
+    - name: Run tests
+      run: |
+        cd python/
+        export CARGO_HOME="/home/runner/.cargo"
+        export CARGO_TARGET_DIR="/home/runner/target"
+
+        python -m venv venv
+        source venv/bin/activate
+
+        pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0

Review comment:
       `pyarrow=1.0` 😭 What's holding this back?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350827


   Thank you @jorgecarleitao I am really excited to see this and would love to see this merged into arrow-datafusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #69: Add datafusion-python

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #69:
URL: https://github.com/apache/arrow-datafusion/pull/69#issuecomment-826350048


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#69](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5fc75a0) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/9ba214a52ed78c57d3d6363c61a88893d41fe906?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9ba214a) will **decrease** coverage by `0.54%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/69/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master      #69      +/-   ##
   ==========================================
   - Coverage   76.24%   75.70%   -0.55%     
   ==========================================
     Files         134      141       +7     
     Lines       23051    23216     +165     
   ==========================================
     Hits        17576    17576              
   - Misses       5475     5640     +165     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9jb250ZXh0LnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/errors.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9lcnJvcnMucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [python/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9zY2FsYXIucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/types.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy90eXBlcy5ycw==) | `0.00% <0.00%> (ø)` | |
   | [python/src/udaf.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy91ZGFmLnJz) | `0.00% <0.00%> (ø)` | |
   | [datafusion/src/physical\_plan/expressions/case.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9jYXNlLnJz) | `72.91% <0.00%> (-0.39%)` | :arrow_down: |
   | [benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmVuY2htYXJrcy9zcmMvYmluL3RwY2gucnM=) | `35.07% <0.00%> (-0.08%)` | :arrow_down: |
   | [ballista/rust/client/src/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jbGllbnQvc3JjL2NvbnRleHQucnM=) | `0.00% <0.00%> (ø)` | |
   | ... and [7 more](https://codecov.io/gh/apache/arrow-datafusion/pull/69/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9ba214a...5fc75a0](https://codecov.io/gh/apache/arrow-datafusion/pull/69?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org