You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/23 05:05:21 UTC

[GitHub] [arrow] arw2019 opened a new pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

arw2019 opened a new pull request #8244:
URL: https://github.com/apache/arrow/pull/8244


   xref https://github.com/apache/arrow/pull/6849#discussion_r404160096
   
   This is a minor refactor. The changes are to replace uses of   `pandas` dataframes with `pa.Table(...)` wherever possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#discussion_r497508500



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -128,19 +128,22 @@ def test_dataset(version):
     num_values = (100, 100)

Review comment:
       I think the `@pytest.mark.pandas` can be removed here as well?

##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -128,19 +128,22 @@ def test_dataset(version):
     num_values = (100, 100)
     num_files = 5
     paths = [random_path() for i in range(num_files)]
-    df = pd.DataFrame(np.random.randn(*num_values),
-                      columns=['col_' + str(i)
-                               for i in range(num_values[1])])
+    table = pa.Table.from_arrays(

Review comment:
       In principle we could also use `pa.table(..` here instead (a bit more ergonomic to use)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8244: ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-705951731


   @jorisvandenbossche @emkornfield This is ready to go:
   - [x] addressed comments
   - [x] updated title & description
   - [x] rebased (CI green)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-697136388


   https://issues.apache.org/jira/browse/ARROW-8355


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche closed pull request #8244: ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche closed pull request #8244:
URL: https://github.com/apache/arrow/pull/8244


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-704769151


   @jorisvandenbossche can this be merged now?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8244: ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-706225605


   Thanks @arw2019 !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-704898306


   I don't think @arw2019 already pushed the update? (although answered to the comments)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8244: ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-706229088


   Thanks @jorisvandenbossche for reviewing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8244: ARROW-8355: [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#issuecomment-705951731


   @jorisvandenbossche @emkornfield This is ready to go:
   - [x] addressed comments
   - [x] updated title & description
   - [x] rebased (CI green)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#discussion_r501111040



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -128,19 +128,22 @@ def test_dataset(version):
     num_values = (100, 100)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#discussion_r499335730



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -128,19 +128,22 @@ def test_dataset(version):
     num_values = (100, 100)
     num_files = 5
     paths = [random_path() for i in range(num_files)]
-    df = pd.DataFrame(np.random.randn(*num_values),
-                      columns=['col_' + str(i)
-                               for i in range(num_values[1])])
+    table = pa.Table.from_arrays(

Review comment:
       will switch to that




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8244:
URL: https://github.com/apache/arrow/pull/8244#discussion_r501246193



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -128,19 +128,22 @@ def test_dataset(version):
     num_values = (100, 100)

Review comment:
       For this to work I had to remove the call to `_check_pandas_version` in `FeatherDataset`'s `__init__`. I think that was a bug since `FeatherDataset` is supposed to work without `pandas`. I added a call to `_check_pandas_version` in `FeatherDataset.read_pandas` since we do want it there




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org