You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Victor Shih (JIRA)" <ji...@apache.org> on 2019/04/10 08:40:00 UTC

[jira] [Updated] (ARROW-5156) `df.to_parquet('s3://...', partition_cols=...)` fails with `'NoneType' object has no attribute '_isfilestore'`

     [ https://issues.apache.org/jira/browse/ARROW-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Shih updated ARROW-5156:
-------------------------------
    Description: 
According to [https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files], writing a parquet to S3 with `partition_cols` should work, but it fails for me. Example script:
{code:java}
import pandas as pd
import sys
print(sys.version)
print(pd._version_)
df = pd.DataFrame([{'a': 1, 'b': 2}])
df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow')
print('OK 1')
df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow')
print('OK 2')
{code}
Output:
{noformat}
3.5.2 (default, Feb 14 2019, 01:46:27)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)]
0.24.2
OK 1
Traceback (most recent call last):
File "./t.py", line 14, in <module>
df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow')
File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2203, in to_parquet
partition_cols=partition_cols, **kwargs)
File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 252, in to_parquet
partition_cols=partition_cols, **kwargs)
File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 118, in write
partition_cols=partition_cols, **kwargs)
File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1227, in write_to_dataset
_mkdir_if_not_exists(fs, root_path)
File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1182, in _mkdir_if_not_exists
if fs._isfilestore() and not fs.exists(path):
AttributeError: 'NoneType' object has no attribute '_isfilestore'
{noformat}
 

Original issue - [https://github.com/apache/arrow/issues/4030]

  was:
According to [https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files], writing a parquet to S3 with `partition_cols` should work, but it fails for me. Example script:
{code:java}
import pandas as pd
import sys
print(sys.version)
print(pd._version_)
df = pd.DataFrame([{'a': 1, 'b': 2}])
df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow')
print('OK 1')
df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow')
print('OK 2')
{code}
Output:
{noformat}
3.5.2 (default, Feb 14 2019, 01:46:27) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] 0.24.2 OK 1 Traceback (most recent call last): File "./t.py", line 14, in <module> df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow') File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2203, in to_parquet partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 252, in to_parquet partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 118, in write partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1227, in write_to_dataset _mkdir_if_not_exists(fs, root_path) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1182, in _mkdir_if_not_exists if fs._isfilestore() and not fs.exists(path): AttributeError: 'NoneType' object has no attribute '_isfilestore'
{noformat}
 

Original issue - [https://github.com/apache/arrow/issues/4030]


> `df.to_parquet('s3://...', partition_cols=...)` fails with `'NoneType' object has no attribute '_isfilestore'`
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5156
>                 URL: https://issues.apache.org/jira/browse/ARROW-5156
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.1
>         Environment: Mac, Linux
>            Reporter: Victor Shih
>            Priority: Major
>
> According to [https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files], writing a parquet to S3 with `partition_cols` should work, but it fails for me. Example script:
> {code:java}
> import pandas as pd
> import sys
> print(sys.version)
> print(pd._version_)
> df = pd.DataFrame([{'a': 1, 'b': 2}])
> df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow')
> print('OK 1')
> df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow')
> print('OK 2')
> {code}
> Output:
> {noformat}
> 3.5.2 (default, Feb 14 2019, 01:46:27)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)]
> 0.24.2
> OK 1
> Traceback (most recent call last):
> File "./t.py", line 14, in <module>
> df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow')
> File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2203, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 252, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 118, in write
> partition_cols=partition_cols, **kwargs)
> File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1227, in write_to_dataset
> _mkdir_if_not_exists(fs, root_path)
> File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1182, in _mkdir_if_not_exists
> if fs._isfilestore() and not fs.exists(path):
> AttributeError: 'NoneType' object has no attribute '_isfilestore'
> {noformat}
>  
> Original issue - [https://github.com/apache/arrow/issues/4030]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)