You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/25 07:47:45 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

AlenkaF opened a new pull request, #14492:
URL: https://github.com/apache/arrow/pull/14492

   Pass `**kwargs` in `read_feather` to `to_pandas()` to ensure `timestamp_as_object=True` (together with other `**kwargs`) can be passed and the conversion between Arrow and pandas doesn't fail due to different datetime resolutions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14492:
URL: https://github.com/apache/arrow/pull/14492#discussion_r1006527107


##########
python/pyarrow/tests/test_feather.py:
##########
@@ -838,3 +838,26 @@ def test_preserve_index_pandas(version):
         expected = df
 
     _check_pandas_roundtrip(df, expected, version=version)
+
+
+@pytest.mark.pandas
+def test_feather_datetime_resolution_arrow_to_pandas(datadir):
+    # ARROW-17192 - ensure timestamp_as_object=True (together with other
+    # **kwargs) can be passed in read_feather to to_pandas.
+
+    # file generated with:
+    #   from datetime import datetime
+    #   df = pd.DataFrame({"date": [
+    #       datetime.fromisoformat("1654-01-01"),
+    #       datetime.fromisoformat("1920-01-01"), ],
+    #   })
+    #   df.to_feather(datadir / "test_resolution.feather")

Review Comment:
   Is there a reason to not do this on the fly in the test in a tempdir?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1293779233

   Updated test looks good! 
   There is still the data file committed though, that can be removed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1290603011

   @AlenkaF feel free to merge once CI is fixed and you can update here to get green CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1294426703

   OK, this should be ready now. Can merge when the CI is green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1292404315

   The release jobs keep failing even after rebasing, the MacOs failure has a Jira: https://issues.apache.org/jira/browse/ARROW-18150.
   
   None of these failures seem to be connected to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on code in PR #14492:
URL: https://github.com/apache/arrow/pull/14492#discussion_r1006737517


##########
python/pyarrow/tests/test_feather.py:
##########
@@ -838,3 +838,26 @@ def test_preserve_index_pandas(version):
         expected = df
 
     _check_pandas_roundtrip(df, expected, version=version)
+
+
+@pytest.mark.pandas
+def test_feather_datetime_resolution_arrow_to_pandas(datadir):
+    # ARROW-17192 - ensure timestamp_as_object=True (together with other
+    # **kwargs) can be passed in read_feather to to_pandas.
+
+    # file generated with:
+    #   from datetime import datetime
+    #   df = pd.DataFrame({"date": [
+    #       datetime.fromisoformat("1654-01-01"),
+    #       datetime.fromisoformat("1920-01-01"), ],
+    #   })
+    #   df.to_feather(datadir / "test_resolution.feather")

Review Comment:
   There was a failing test but the issue was in the old version of pandas and `.to_feather()`, not in generating the file in the test (as I was mistakenly presuming, thanks for the help here!).
   
   I will keep it as it was and use `pyarrow.write_feather` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on code in PR #14492:
URL: https://github.com/apache/arrow/pull/14492#discussion_r1006737517


##########
python/pyarrow/tests/test_feather.py:
##########
@@ -838,3 +838,26 @@ def test_preserve_index_pandas(version):
         expected = df
 
     _check_pandas_roundtrip(df, expected, version=version)
+
+
+@pytest.mark.pandas
+def test_feather_datetime_resolution_arrow_to_pandas(datadir):
+    # ARROW-17192 - ensure timestamp_as_object=True (together with other
+    # **kwargs) can be passed in read_feather to to_pandas.
+
+    # file generated with:
+    #   from datetime import datetime
+    #   df = pd.DataFrame({"date": [
+    #       datetime.fromisoformat("1654-01-01"),
+    #       datetime.fromisoformat("1920-01-01"), ],
+    #   })
+    #   df.to_feather(datadir / "test_resolution.feather")

Review Comment:
   There was a failing test but the issue was in the old version of pandas and `.to_feather()`, not in generating the file in the test (as I was mistakenly presuming, thanks for the help here!).
   
   I will keep it as it was and use `pyarrow.to_feather` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1309529321

   Benchmark runs are scheduled for baseline = 94cf74f3b5b58a0bb45e7123ea907f2f21319776 and contender = 28a1152a5b0a5a58bcf24bdfd6aea54ee7282360. 28a1152a5b0a5a58bcf24bdfd6aea54ee7282360 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/ba1b289b9e324a7888f84ffc1e173610...89e8b535ae5b4a6a956240ba8a7b43a0/)
   [Finished :arrow_down:0.37% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6b0cebc4553149d0be4813e84419db67...42767672940143c683a4c7dce42d939d/)
   [Finished :arrow_down:0.27% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/ce4608dba660432f9d2835fa70b438ca...1367e73e04594910a9bd3d4d58b3c208/)
   [Finished :arrow_down:0.6% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/eda4170daab94028b5948f344ac38ff0...9fbb3b13b1ae4964a1cff5797138228e/)
   Buildkite builds:
   [Finished] [`28a1152a` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1838)
   [Finished] [`28a1152a` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1860)
   [Finished] [`28a1152a` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1826)
   [Finished] [`28a1152a` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1852)
   [Finished] [`94cf74f3` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1837)
   [Finished] [`94cf74f3` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1859)
   [Finished] [`94cf74f3` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1825)
   [Finished] [`94cf74f3` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1851)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] adrienpacifico commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
adrienpacifico commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1313342293

   Thank you very much @AlenkaF !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14492:
URL: https://github.com/apache/arrow/pull/14492#issuecomment-1290256477

   https://issues.apache.org/jira/browse/ARROW-17192


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF merged pull request #14492: ARROW-17192: [Python] Pass **kwargs in read_feather to to_pandas()

Posted by GitBox <gi...@apache.org>.
AlenkaF merged PR #14492:
URL: https://github.com/apache/arrow/pull/14492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org