You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/06 10:33:16 UTC

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #275: add postgres comparison to python test

Jimexist opened a new pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275


   # Which issue does this PR close?
   
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   
   Closes #.
   
    # Rationale for this change
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   
   # What changes are included in this PR?
   
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   
   # Are there any user-facing changes?
   
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833573089


   > I am wondering whether the SQL integration tests should be part of the Python bindings or just within DataFusion (using the Rust API or some endpoint)? I think there is nothing wrong with having some SQL tests in the Python bindings too, but I would expect if we have a big suite of SQL tests to compare with PostgreSQL, this would be outside of the Python part.
   
   IMO if the python wrapper would be thin, almost always up to date with rust library, and stay that way, having it at the python level makes more sense.
   
   Happy to move this out, but maybe in a separate PR later, when things fledge out more thoroughly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833562688


   I think having comparison tests against PostgreSQL is super useful :+1: 
   
   I am wondering whether the SQL integration tests should be part of the Python bindings or just within DataFusion (using the Rust API or some endpoint)? I think there is nothing wrong with having some SQL tests in the Python bindings too, but I would expect if we have a big suite of SQL tests to compare with PostgreSQL, this would be outside of the Python part.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#discussion_r627655000



##########
File path: .github/workflows/python_test.yaml
##########
@@ -21,38 +21,57 @@ on: [push, pull_request]
 jobs:
   test:
     runs-on: ubuntu-latest
+    services:
+      postgres:
+        image: postgres:13
+        env:
+          POSTGRES_PASSWORD: postgres
+          POSTGRES_DB: db_test
+        ports:
+          - 5432/tcp
+        options: >-
+          --health-cmd pg_isready
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
     steps:
-    - uses: actions/checkout@v2
-    - name: Setup Rust toolchain
-      run: |
-        rustup toolchain install nightly-2021-01-06
-        rustup default nightly-2021-01-06
-        rustup component add rustfmt
-    - name: Cache Cargo
-      uses: actions/cache@v2
-      with:
-        path: /home/runner/.cargo
-        key: cargo-maturin-cache-
-    - name: Cache Rust dependencies
-      uses: actions/cache@v2
-      with:
-        path: /home/runner/target
-        key: target-maturin-cache-
-    - uses: actions/setup-python@v2
-      with:
-        python-version: '3.7'
-    - name: Install Python dependencies
-      run: python -m pip install --upgrade pip setuptools wheel
-    - name: Run tests
-      run: |
-        cd python/
-        export CARGO_HOME="/home/runner/.cargo"
-        export CARGO_TARGET_DIR="/home/runner/target"
+      - uses: actions/checkout@v2
+      - name: Setup Rust toolchain
+        run: |
+          rustup toolchain install nightly-2021-01-06
+          rustup default nightly-2021-01-06
+          rustup component add rustfmt
+      - name: Cache Cargo
+        uses: actions/cache@v2
+        with:
+          path: /home/runner/.cargo
+          key: cargo-maturin-cache-
+      - name: Cache Rust dependencies
+        uses: actions/cache@v2
+        with:
+          path: /home/runner/target
+          key: target-maturin-cache-
+      - uses: actions/setup-python@v2
+        with:
+          python-version: "3.8"
+      - name: Install Python dependencies
+        run: python -m pip install --upgrade pip setuptools wheel
+      - name: Run tests
+        run: |
+          cd python/
+          export CARGO_HOME="/home/runner/.cargo"
+          export CARGO_TARGET_DIR="/home/runner/target"
 
-        python -m venv venv
-        source venv/bin/activate
+          python -m venv venv
+          source venv/bin/activate
 
-        pip install maturin==0.10.4 toml==0.10.1 pyarrow==4.0.0
-        maturin develop
+          # TODO move to use poetry

Review comment:
       I am curious about this `TODO` ? I am not familiar with poetry vs maturin. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833968156


   work continues at https://github.com/apache/arrow-datafusion/pull/281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist closed pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist closed pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#discussion_r627681688



##########
File path: .github/workflows/python_test.yaml
##########
@@ -21,38 +21,57 @@ on: [push, pull_request]
 jobs:
   test:
     runs-on: ubuntu-latest
+    services:
+      postgres:
+        image: postgres:13
+        env:
+          POSTGRES_PASSWORD: postgres
+          POSTGRES_DB: db_test
+        ports:
+          - 5432/tcp
+        options: >-
+          --health-cmd pg_isready
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
     steps:
-    - uses: actions/checkout@v2
-    - name: Setup Rust toolchain
-      run: |
-        rustup toolchain install nightly-2021-01-06
-        rustup default nightly-2021-01-06
-        rustup component add rustfmt
-    - name: Cache Cargo
-      uses: actions/cache@v2
-      with:
-        path: /home/runner/.cargo
-        key: cargo-maturin-cache-
-    - name: Cache Rust dependencies
-      uses: actions/cache@v2
-      with:
-        path: /home/runner/target
-        key: target-maturin-cache-
-    - uses: actions/setup-python@v2
-      with:
-        python-version: '3.7'
-    - name: Install Python dependencies
-      run: python -m pip install --upgrade pip setuptools wheel
-    - name: Run tests
-      run: |
-        cd python/
-        export CARGO_HOME="/home/runner/.cargo"
-        export CARGO_TARGET_DIR="/home/runner/target"
+      - uses: actions/checkout@v2
+      - name: Setup Rust toolchain
+        run: |
+          rustup toolchain install nightly-2021-01-06
+          rustup default nightly-2021-01-06
+          rustup component add rustfmt
+      - name: Cache Cargo
+        uses: actions/cache@v2
+        with:
+          path: /home/runner/.cargo
+          key: cargo-maturin-cache-
+      - name: Cache Rust dependencies
+        uses: actions/cache@v2
+        with:
+          path: /home/runner/target
+          key: target-maturin-cache-
+      - uses: actions/setup-python@v2
+        with:
+          python-version: "3.8"
+      - name: Install Python dependencies
+        run: python -m pip install --upgrade pip setuptools wheel
+      - name: Run tests
+        run: |
+          cd python/
+          export CARGO_HOME="/home/runner/.cargo"
+          export CARGO_TARGET_DIR="/home/runner/target"
 
-        python -m venv venv
-        source venv/bin/activate
+          python -m venv venv
+          source venv/bin/activate
 
-        pip install maturin==0.10.4 toml==0.10.1 pyarrow==4.0.0
-        maturin develop
+          # TODO move to use poetry

Review comment:
       poetry is a python package to resolve dependency trees. Basically `pip` is kind of "broken" in some aspects, and [poetry](https://python-poetry.org/) (together with conda, [pip-env](https://pypi.org/project/pipenv/) and [pip-tools](https://github.com/jazzband/pip-tools)) are addressing some of problems with `pip`.
   
   In this context, it addresses the line below, `pip install ...`.
   
   My understanding is that poetry is used in the context of Python applications, and this is a Python library. I can understand its use for resolving dependencies in testing the library, but something like [`tox`](https://tox.readthedocs.io/en/latest/) would probably make more sense, even though it is not so easy because this is a compiled Python library.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833558100


   > I run `python -m unittest discover tests` locally. With this PR, this will harness the postgres tests, which will fail.
   
   let me add https://github.com/apache/arrow-datafusion/pull/275/files#diff-45af8b13785466b13a07a2988427b31801a716a88dc53e579d58f4a3d4219cc9R32 to skip locally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833570412


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#275](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b340e31) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/3be087a78846beffdbc4a9f80c73938fa18d24a7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3be087a) will **not change** coverage.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head b340e31 differs from pull request most recent head e89e93d. Consider uploading reports for the commit e89e93d to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/275/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #275   +/-   ##
   =======================================
     Coverage   76.21%   76.21%           
   =======================================
     Files         140      140           
     Lines       23553    23553           
   =======================================
     Hits        17950    17950           
     Misses       5603     5603           
   ```
   
   
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3be087a...e89e93d](https://codecov.io/gh/apache/arrow-datafusion/pull/275?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833551220


   > What do you think about moving the postgres tests to a different directory? They seem to be more like integration tests against a service that needs to be up, as opposed to the unit tests that are available in the package.
   
   maybe leaving it to another PR? I don't believe this shall be an integration test, but rather a parity check which doesn't require any long standing processes (apart from the postgres within the CI)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833573089


   > I am wondering whether the SQL integration tests should be part of the Python bindings or just within DataFusion (using the Rust API or some endpoint)? I think there is nothing wrong with having some SQL tests in the Python bindings too, but I would expect if we have a big suite of SQL tests to compare with PostgreSQL, this would be outside of the Python part.
   
   IMO if the python wrapper would be thin, almost always up to date with rust library, and stay that way, having it at the python level makes more sense.
   
   Happy to move this out, but maybe in a separate PR later, when things fledge out more thoroughly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833955352


   > Thanks a lot for this PR.
   > 
   > I agree that these tests are very useful; I disagree with the approach taken here.
   > 
   > First, postgres tests are not specific to the python implementation and should be addressed at DataFusion; e.g. by calling `datafusion-cli`; the Python part of this repository corresponds to the Python bindings, and the tests should correspond solely to that part of the code base (i.e. FFI works, mapping of RecordBatches work, API works, etc).
   > 
   > If we need to use Python to run tests against postgres (e.g. due to a client), then let's create a separate directory that has a Python script that calls `datafusion-cli` and postgres client.
   > 
   > IMO this also makes testing more difficult because it requires building the Python bindings to run then; imo testing DataFusion against a postgres specification engine should not require building Python bindings.
   
   that's a fair point, let me close this and take another approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833554581


   I run `python -m unittest discover tests` locally. With this PR, this will harness the postgres tests, which will fail.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #275: add postgres comparison to python test

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #275:
URL: https://github.com/apache/arrow-datafusion/pull/275#issuecomment-833529064


   What do you think about moving the postgres tests to a different directory? They seem to be more like integration tests against a service that needs to be up, as opposed to the unit tests that are available in the package.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org