You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/21 16:22:41 UTC

[GitHub] [arrow] pitrou opened a new pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

pitrou opened a new pull request #10769:
URL: https://github.com/apache/arrow/pull/10769


   Pandas will try to import PyArrow if seemingly available.
   However, the currently installed PyArrow may not be compatible with the last compiled Arrow C++
   (e.g. when using `archery benchmark diff ...`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#discussion_r681161372



##########
File path: dev/archery/archery/compat.py
##########
@@ -49,3 +50,10 @@ def _stringify_path(path):
             return str(path)
 
     raise TypeError("not a path-like object")
+
+
+def _import_pandas():
+    # ARROW-13425: avoid importing PyArrow from Pandas
+    sys.modules['pyarrow'] = None

Review comment:
       Sorry, also not directly an idea. Setting this to None generally makes pyarrow non-importable? (also for other packages that might get imported after pandas?) 
   In general archery shouldn't use (directly or indirectly) pyarrow, so I think either way is fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#issuecomment-884320305


   @amol- do you want to review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10769:
URL: https://github.com/apache/arrow/pull/10769


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#issuecomment-891777391


   Yes, it's certainly still useful to prevent it here anyway (also since we should support the current/released pandas). But will open an issue in pandas about conditionally importing `pyarrow.compute` (it will probably not happen regularly that someone has a pyarrow installation without compute support, but still).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#discussion_r674773928



##########
File path: dev/archery/archery/compat.py
##########
@@ -49,3 +50,10 @@ def _stringify_path(path):
             return str(path)
 
     raise TypeError("not a path-like object")
+
+
+def _import_pandas():
+    # ARROW-13425: avoid importing PyArrow from Pandas
+    sys.modules['pyarrow'] = None

Review comment:
       I have no idea what would be most desirable in this case. I would ask @jorisvandenbossche but he's on vacation :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#discussion_r674674329



##########
File path: dev/archery/archery/compat.py
##########
@@ -49,3 +50,10 @@ def _stringify_path(path):
             return str(path)
 
     raise TypeError("not a path-like object")
+
+
+def _import_pandas():
+    # ARROW-13425: avoid importing PyArrow from Pandas
+    sys.modules['pyarrow'] = None

Review comment:
       Should we unset this one after pandas has been correctly imported to leave open the possibility for other commands to try import pyarrow? I guess it might just reintroduce the bug again somewhere else and it might just make sense to explicitly document somewhere that `archery` can't depend on `arrow`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#issuecomment-884320344


   https://issues.apache.org/jira/browse/ARROW-13425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#issuecomment-891217971


   This might actually also be a "bug" in pandas. For basic usage of pandas, it only tries to import pyarrow (and some submodules) but doesn't actually do anything with it. So a failing import of pyarrow shouldn't mess with basic pandas usage.
   
   For the traceback in the JIRA, it seems that pandas checks that pyarrow can be imported (which can go fine, even if it's not properly (re)built)). And then once that goes fine, assumes it can import submodules like `pyarrow.compute`. But in *theory*, you can have a pyarrow install without the compute module enabled. So *maybe* pandas should still to the `pyarrow.compute` import inside a try/except.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#issuecomment-891238547


   Even if there may be an issue in Pandas, it may still want to use PyArrow for whatever operations we ask Pandas to do (e.g. reading a JSON or CSV file). So I think it's useful to prevent it from that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10769: ARROW-13425: [Archery] Avoid importing PyArrow indirectly

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10769:
URL: https://github.com/apache/arrow/pull/10769#discussion_r681181290



##########
File path: dev/archery/archery/compat.py
##########
@@ -49,3 +50,10 @@ def _stringify_path(path):
             return str(path)
 
     raise TypeError("not a path-like object")
+
+
+def _import_pandas():
+    # ARROW-13425: avoid importing PyArrow from Pandas
+    sys.modules['pyarrow'] = None

Review comment:
       Ok, thank you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org