You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jleibs (via GitHub)" <gi...@apache.org> on 2023/04/19 16:40:42 UTC
[GitHub] [arrow] jleibs opened a new issue, #35237: RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed
jleibs opened a new issue, #35237:
URL: https://github.com/apache/arrow/issues/35237
### Describe the bug, including details regarding any error messages, version, and platform.
This is a relatively straightforward problem in which a thread that is continuing to run during shutdown tries to register an atexit handler.
This only happens if the `pandas` library is installed causing the associated shims to be used. This happens regardless of whether or not pandas is in-use by the application.
The problem can be avoided by making sure to join all theads before main exits, but this is not generally required by python so should be considered a bug.
### requirements.txt
```
pandas==2.0.0
pyarrow==11.0.0
```
### main.py
```
import threading
import pyarrow
def use_pyarrow() -> None:
table = pyarrow.table({"a": [1, 2, 3]})
def main() -> None:
t = threading.Thread(target=use_pyarrow, args=())
t.start()
if __name__ == "__main__":
main()
```
### Run:
```
$ python main.py
Traceback (most recent call last):
File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
import concurrent.futures.thread # noqa
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
threading._register_atexit(_python_exit)
File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
Exception ignored in: 'pyarrow.lib._PandasAPIShim._have_pandas_internal'
Traceback (most recent call last):
File "pyarrow/pandas-shim.pxi", line 100, in pyarrow.lib._PandasAPIShim._check_import
File "pyarrow/pandas-shim.pxi", line 48, in pyarrow.lib._PandasAPIShim._import_pandas
File "/home/jleibs/pyarrow-repro/venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 24, in <module>
import concurrent.futures.thread # noqa
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
threading._register_atexit(_python_exit)
File "/usr/lib/python3.10/threading.py", line 1504, in _register_atexit
raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]
Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1854240438
cc @pitrou
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]
Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1838131615
Just to note, I was able to reproduce the error with `pandas==2.1.1` and dev version of pyarrow (`15.0.0.dev`). The issue doesn't happen if I add the import as Weston suggested at the top level of the program.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #35237: RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1516770159
As a workaround, does it work if you add:
```
from pyarrow.pandas_compat import _pandas_api
```
at the top level of your program (before main shuts down)?
This seems to be a transitive consequence of https://github.com/python/cpython/issues/86813#issuecomment-1246097184
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Python] RuntimeError when using pyarrow from a thread which is not joined before main exits when pandas is installed [arrow]
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35237:
URL: https://github.com/apache/arrow/issues/35237#issuecomment-1854537505
This should be trivial to workaround in PyArrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org