You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "SEUNGMIN HEO (Jira)" <ji...@apache.org> on 2020/06/17 07:29:00 UTC

[jira] [Issue Comment Deleted] (ARROW-9117) [Python] Is there Pandas circular dependency problem?

     [ https://issues.apache.org/jira/browse/ARROW-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

SEUNGMIN HEO updated ARROW-9117:
--------------------------------
    Comment: was deleted

(was: I think this is deadlock problem from import pandas. 

 

My Environment:

Python = 3.7.6

PyArrow = 0.17.1

Pandas = 1.0.3

with Multithreading

 

My pandas Deadlock situation 

i) I use  .drop() method in pyarrow.Table

ii) I didn't import pandas globally.

iii) My multithreading tasks use .drop() method concurrently

iv) Each tasks import pandas dynamically and concurrently and since pandas uses import lock, deadlock occurs.

 

I can handle this deadlock error by import pandas globally.

 

 )

> [Python] Is there Pandas circular dependency problem?
> -----------------------------------------------------
>
>                 Key: ARROW-9117
>                 URL: https://issues.apache.org/jira/browse/ARROW-9117
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.1
>            Reporter: SEUNGMIN HEO
>            Priority: Major
>
> I'm using Pyarrow for generating parquet dataset.
> Whenever I test my code, I encountered same error 
> can't import name 'BlockManager' error.
> In many cases, I know this error occurs when there is circular dependency
> this is my reproduced sample code 
>  
>  
> {code:java}
> field_col1 = pyarrow.field('col1', type=pyarrow.int64(), nullable=True, metadata=None)
>  field_col2 = pyarrow.field('col2', type=pyarrow.col232(), nullable=True, metadata=None)
> col1_arr = pyarrow.array([col1] * len_rows, pyarrow.int64())
>  col2_arr = pyarrow.array([file_col2] * len_rows, pyarrow.date32())
> csv_table = csv_table.add_column(0, field_col2, col2_arr)
>  csv_table = csv_table.add_column(0, field_col1, col1_arr)
> csv_table = csv_table.cast(generate_schema(csv_table))
> parquet.write_to_dataset(csv_table,
>  f"{s3_path}/{table_name}",
>  partition_cols=['col1', 'col2'],
>  partition_filename_cb=lambda partition_cols: partition_cols[1].strftime("%Y-%m-%d"),
>  filesystem=s3_fs,
>  compression='snappy') 
> {code}
> And this is error message
>  
> {code:java}
> Traceback (most recent call last):
>  File "pyarrow/pandas-shim.pxi", line 107, in pyarrow.lib._PandasAPIShim._check_import
>  File "pyarrow/pandas-shim.pxi", line 44, in pyarrow.lib._PandasAPIShim._import_pandas
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/_init_.py", line 42, in <module>
>  from pandas.core.api import *
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/core/api.py", line 26, in <module>
>  from pandas.core.groupby import Grouper
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/core/groupby/_init_.py", line 1, in <module>
>  from pandas.core.groupby.groupby import GroupBy # noqa: F401
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 37, in <module>
>  from pandas.core.frame import DataFrame
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 87, in <module>
>  from pandas.core.generic import NDFrame, _shared_docs
>  File "/Users/aa/Desktop/Python/youtube-api-caller/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 46, in <module>
>  from pandas.core.internals import BlockManager
>  File "<frozen importlib._bootstrap>", line 980, in _find_and_load
>  File "<frozen importlib.bootstrap>", line 149, in __enter_
>  File "<frozen importlib._bootstrap>", line 94, in acquire
>  _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('pandas.core.internals') at 4776509328
> cannot import name 'BlockManager' from 'pandas.core.internals'
>  
> {code}
>  
>  
> How can I solve this error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)