You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/09 14:45:18 UTC

[GitHub] [arrow] jorisvandenbossche opened a new pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

jorisvandenbossche opened a new pull request #9966:
URL: https://github.com/apache/arrow/pull/9966


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-816754009


   https://issues.apache.org/jira/browse/ARROW-12314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #9966:
URL: https://github.com/apache/arrow/pull/9966


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-816735631


   @github-actions crossbow submit test-conda-python-3.7-kartothek-latest test-conda-python-3.7-kartothek-master
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#discussion_r611598768



##########
File path: python/pyarrow/tests/parquet/test_dataset.py
##########
@@ -972,6 +972,10 @@ def test_dataset_read_pandas(tempdir, use_legacy_dataset):
 
     tm.assert_frame_equal(result, expected)
 
+    # also be able to pass the columns as a set (ARROW-12314)
+    result = dataset.read_pandas(columns=set(columns)).to_pandas()
+    tm.assert_frame_equal(result, expected)

Review comment:
       Yes, still needed to fix that, will do that now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-817787002


   > though it also looks like there's one more integration issue remaining.
   
   I *think* the remaining test failure can be ignored. It seems that the failure is coming from a changed error message, which was being asserted in the kartothek tests:
   
   ```
       def test_load_dataframes_columns_raises_missing(
           meta_partitions_evaluation_files_only, store_session
       ):
           mp = meta_partitions_evaluation_files_only[0]
           assert mp.file is not None
           assert mp.data is not None
           with pytest.raises(ValueError) as e:
               meta_partitions_evaluation_files_only[0].load_dataframes(
                   store=store_session, columns=["P", "L", "HORIZON", "foo", "bar"]
               )
   >       assert str(e.value) == "Columns cannot be found in stored dataframe: bar, foo"
   E       AssertionError: assert 'No match for...\nPRED: int64' == 'Columns cann...ame: bar, foo'
   E         - Columns cannot be found in stored dataframe: bar, foo
   E         + No match for FieldRef.Name(foo) in P: int64
   E         + L: int64
   E         + HORIZON: int64
   E         + PRED: int64
   ```
   
   So this is something kartothek will need to change (assuming we are OK with changing error messages, which I think we are).
   
   Will look into ignoring this one failure for the integration tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#discussion_r611862849



##########
File path: ci/scripts/integration_kartothek.sh
##########
@@ -27,4 +27,5 @@ python -c "import pyarrow.parquet"
 python -c "import kartothek"
 
 pushd /kartothek
-pytest -n0 --ignore tests/cli/test_query.py
+# See ARROW-12314, test_load_dataframes_columns_raises_missing skipped because of changed error message

Review comment:
       Yes, will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-817850657


   @github-actions crossbow submit test-conda-python-3.7-kartothek-latest test-conda-python-3.7-kartothek-master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-818002374


   Revision: f90f2ef1437058f1b6b307e1dd2e2958acc3ab73
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-311](https://github.com/ursacomputing/crossbow/branches/all?query=actions-311)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.7-kartothek-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-311-github-test-conda-python-3.7-kartothek-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-311-github-test-conda-python-3.7-kartothek-latest)|
   |test-conda-python-3.7-kartothek-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-311-github-test-conda-python-3.7-kartothek-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-311-github-test-conda-python-3.7-kartothek-master)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#discussion_r611594814



##########
File path: python/pyarrow/tests/parquet/test_dataset.py
##########
@@ -972,6 +972,10 @@ def test_dataset_read_pandas(tempdir, use_legacy_dataset):
 
     tm.assert_frame_equal(result, expected)
 
+    # also be able to pass the columns as a set (ARROW-12314)
+    result = dataset.read_pandas(columns=set(columns)).to_pandas()
+    tm.assert_frame_equal(result, expected)

Review comment:
       It looks like this test needs to account for column order potentially being different, since sets (surprisingly?) didn't get deterministic ordering like dictionaries did.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#discussion_r611852656



##########
File path: ci/scripts/integration_kartothek.sh
##########
@@ -27,4 +27,5 @@ python -c "import pyarrow.parquet"
 python -c "import kartothek"
 
 pushd /kartothek
-pytest -n0 --ignore tests/cli/test_query.py
+# See ARROW-12314, test_load_dataframes_columns_raises_missing skipped because of changed error message

Review comment:
       Should we file an issue on their side to update the test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9966: ARROW-12314: [Python] Accept columns as set in parquet read_pandas

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9966:
URL: https://github.com/apache/arrow/pull/9966#issuecomment-816758337


   Revision: 9249bff746e421b235d36adcdb2e48215cd92555
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-302](https://github.com/ursacomputing/crossbow/branches/all?query=actions-302)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.7-kartothek-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-302-github-test-conda-python-3.7-kartothek-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-302-github-test-conda-python-3.7-kartothek-latest)|
   |test-conda-python-3.7-kartothek-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-302-github-test-conda-python-3.7-kartothek-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-302-github-test-conda-python-3.7-kartothek-master)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org