You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jleibs (via GitHub)" <gi...@apache.org> on 2023/04/19 16:40:42 UTC

[GitHub] [arrow] jleibs opened a new issue, #35237: RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed

jleibs opened a new issue, #35237:
URL: https://github.com/apache/arrow/issues/35237

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   This is a relatively straightforward problem in which a thread that is continuing to run during shutdown tries to register an atexit handler.
   
   This only happens if the `pandas` library is installed causing the associated shims to be used. This happens regardless of whether or not pandas is in-use by the application.
   
   The problem can be avoided by making sure to join all theads before main exits, but this is not generally required by python so should be considered a bug.
   
   ### requirements.txt
   ```
   pandas==2.0.0
   pyarrow==11.0.0
   ```
   
   ### main.py
   ```
   import threading
   import pyarrow
   
   
   def use_pyarrow() -> None:
       table = pyarrow.table({"a": [1, 2, 3]})
   
   
   def main() -> None:
       t = threading.Thread(target=use_pyarrow, args=())
       t.start()
   
   if __name__ == "__main__":
       main()
   ```
   
   ### Run:
   ```
   $ python main.py 
   Traceback (most recent call last):
     File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
     File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
     File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
       import concurrent.futures.thread  # noqa
     File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
       threading._register_atexit(_python_exit)
     File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
       raise RuntimeError("can't register atexit after shutdown")
   RuntimeError: can't register atexit after shutdown
   Exception ignored in: 'pyarrow.lib._PandasAPIShim._have_pandas_internal'
   Traceback (most recent call last):
     File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
     File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
     File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
       import concurrent.futures.thread  # noqa
     File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
       threading._register_atexit(_python_exit)
     File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
       raise RuntimeError("can't register atexit after shutdown")
   RuntimeError: can't register atexit after shutdown
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1854240438

   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1838131615

   Just to note, I was able to reproduce the error with `pandas==2.1.1` and dev version of pyarrow (`15.0.0.dev`). The issue doesn't happen if I add the import as Weston suggested at the top level of the program.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35237: RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1516770159

   As a workaround, does it work if you add:
   
   ```
   from pyarrow.pandas_compat import _pandas_api
   ```
   
   at the top level of your program (before main shuts down)?
   
   This seems to be a transitive consequence of https://github.com/python/cpython/issues/86813#issuecomment-1246097184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1854537505

   This should be trivial to workaround in PyArrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org