You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/03/28 13:42:41 UTC

[jira] [Commented] (ARROW-723) Arrow freezes on write if chunk_size=0

    [ https://issues.apache.org/jira/browse/ARROW-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945189#comment-15945189 ] 

Wes McKinney commented on ARROW-723:
------------------------------------

Thanks [~mangecoeur] -- I marked this for the next Arrow release. If you have time to figure out where the crash is happening and submit a patch, you are more than welcome. 

> Arrow freezes on write if chunk_size=0
> --------------------------------------
>
>                 Key: ARROW-723
>                 URL: https://issues.apache.org/jira/browse/ARROW-723
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.2.0
>         Environment: Linux, macOS
>            Reporter: Jonathan Chambers
>
> Pyarrow freezes if you set chunk_size=0 (e.g. if you forget to account for short data when setting chunk size as a function of table length, see example).
> Would expect either to handle gracefully (e.g. revert to behaviour chunk_size=None) or to throw error.
> ```
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> cols = 'A', 'B', 'C', 'D'
> row = np.arange(4)
> data = pd.DataFrame([row], columns=cols)
> table = pa.Table.from_pandas(data.reset_index(), timestamps_to_ms=True)
> pq.write_table(table, 'test.pq', chunk_size=int(len(data) / 4))
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)