You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Martin Thøgersen (Jira)" <ji...@apache.org> on 2022/01/27 15:56:00 UTC

[jira] [Created] (ARROW-15484) kwargs fails for pyarrow.parquet.write_to_dataset()

Martin Thøgersen created ARROW-15484:
----------------------------------------

             Summary: kwargs fails for pyarrow.parquet.write_to_dataset()
                 Key: ARROW-15484
                 URL: https://issues.apache.org/jira/browse/ARROW-15484
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, Python
    Affects Versions: 6.0.1
            Reporter: Martin Thøgersen


When supplying `kwargs` such as `basename_template` or `existing_data_behaviour` to `pyarrow.parquet.write_to_dataset()`, it fails as below.
 
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

df = pd.DataFrame({
    'int': [1, 2],
    'str': ['a', 'b']
})

table = pa.Table.from_pandas(df)

"""
**kwargs : dict,
    Additional kwargs for write_table function. See docstring for write_table or ParquetWriter for more information.
"""
pq.write_to_dataset(table, root_path='foo',
                    use_legacy_dataset=False,
                    # kwargs:
                    basename_template="prefix-{i}.parquet",
                    existing_data_behaviour="error"
                    )
{code}
{noformat}
TypeError                                 Traceback (most recent call last)
...test.py in <module>
     16     Additional kwargs for write_table function. See docstring for write_table or ParquetWriter for more information.
     17 """
---> 18 pq.write_to_dataset(table, root_path='foo',
     19                     use_legacy_dataset=False,
     20                     # kwargs:

...lib/python3.8/site-packages/pyarrow/parquet.py in write_to_dataset(table, root_path, partition_cols, partition_filename_cb, filesystem, use_legacy_dataset, **kwargs)
   2144         # map format arguments
   2145         parquet_format = ds.ParquetFileFormat()
-> 2146         write_options = parquet_format.make_write_options(**kwargs)
   2147 
   2148         # map old filesystems to new one

...lib/python3.8/site-packages/pyarrow/_dataset.pyx in pyarrow._dataset.ParquetFileFormat.make_write_options()

...lib/python3.8/site-packages/pyarrow/_dataset.pyx in pyarrow._dataset.ParquetFileWriteOptions.update()

TypeError: unexpected parquet write option: basename_template
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)