You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/20 14:33:42 UTC

[GitHub] [arrow] jorisvandenbossche opened a new pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

jorisvandenbossche opened a new pull request #6992:
URL: https://github.com/apache/arrow/pull/6992


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618459883


   I'm OK with this. The maintenance burden of supporting several years' worth of pandas releases seems like a lot to bear. If there are parties which are affected by this they should contribute to the maintenance (either monetarily or with their time)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618366876


   cc @wesm @xhochy @BryanCutler are you fine with
   
   1) a hard required minimal pandas version? (meaning: we don't use the pandas integration if an older version is installed, instead of trying to use it and potentially erroring/failing in certain cases)
   2) if ok with 1, pandas 0.23 as minimal version? (released 2 years ago, but our tests still pass with eg 0.21 as well)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618674669


   I further cleaned up the shim to remove if/else checks we no longer need, so should be ready now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-616601667


   https://issues.apache.org/jira/browse/ARROW-7950


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #6992:
URL: https://github.com/apache/arrow/pull/6992#discussion_r413740270



##########
File path: python/pyarrow/pandas-shim.pxi
##########
@@ -55,6 +55,16 @@ cdef class _PandasAPIShim(object):
         from distutils.version import LooseVersion
         self._loose_version = LooseVersion(pd.__version__)
 
+        if self._loose_version < LooseVersion('0.23.0'):
+            self._have_pandas = False
+            if raise_:
+                raise ImportError(
+                    "pyarrow requires pandas 0.23.0 or above, pandas {} is "
+                    "installed".format(self._version)
+                )
+            else:
+                return

Review comment:
       Yes, that's a good idea




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618537844


   Actually I'll hold off on merging this to confirm that @jorisvandenbossche has done everything that he planned


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-616595450


   This is still WIP (depending on which pandas version we choose, we can clean up some things in the pandas-shim.pxi), but:
   
   - I defined the minimal pandas version now as pandas 0.23 (released 2 years ago). But, I also ran tests with pandas 0.22 and pandas 0.21, and tests are still passing for those. So we *could* also set this minimal version at 0.21 (the mailing list thread that triggered this issue reported an error on pandas 0.20). But personally, I think 0.23 is more than fine.
   - I now raise an error when the pandas version is too low. But, since it is actually working (mostly) for lower versions as well, we could also only give a warning instead. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618678293


   Sweet thanks, merging now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #6992:
URL: https://github.com/apache/arrow/pull/6992#discussion_r412929368



##########
File path: python/pyarrow/pandas-shim.pxi
##########
@@ -55,6 +55,16 @@ cdef class _PandasAPIShim(object):
         from distutils.version import LooseVersion
         self._loose_version = LooseVersion(pd.__version__)
 
+        if self._loose_version < LooseVersion('0.23.0'):
+            self._have_pandas = False
+            if raise_:
+                raise ImportError(
+                    "pyarrow requires pandas 0.23.0 or above, pandas {} is "
+                    "installed".format(self._version)
+                )
+            else:
+                return

Review comment:
       Could we perhaps emit a warning here? I don't think that users expect their Pandas installation to be silently ignored.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xhochy commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
xhochy commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618538122


   👍 
   
   2 years ago released `pandas` version still sounds very generous. People who cannot upgrade from that to a newer version will probably have the same problems with `pyarrow` updates.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #6992:
URL: https://github.com/apache/arrow/pull/6992#discussion_r412929769



##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -2685,8 +2685,8 @@ class A:
             'a': pd.period_range('2000-01-01', periods=20),
         })
 
-        expected_msg = 'Conversion failed for column a with type period'
-        with pytest.raises(TypeError, match=expected_msg):
+        expected_msg = 'Conversion failed for column a with type period|object'

Review comment:
       Do you mean `(period|object)`? (parentheses included)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618517973


   Sounds good to me. FWIW, Spark also has a minimum Pandas version set at 0.23.2.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-618536764


   The Appveyor failure is unrelated


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org