You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/09/19 02:30:00 UTC
[jira] [Assigned] (ARROW-6573) [Python] Segfault when writing to
parquet
[ https://issues.apache.org/jira/browse/ARROW-6573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned ARROW-6573:
-----------------------------------
Assignee: Wes McKinney
> [Python] Segfault when writing to parquet
> -----------------------------------------
>
> Key: ARROW-6573
> URL: https://issues.apache.org/jira/browse/ARROW-6573
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.14.1
> Environment: Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7.
> Reporter: Josh Weinstock
> Assignee: Wes McKinney
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.15.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes.
> Here is a reproducible example:
>
> {code:java}
> import pyarrow as pa
> import pyarrow.parquet as pq
> data = dict()
> data["key"] = [0, 1, 2, 3] # segfault
> #data["key"] = ["0", "1", "2", "3"] # no segfault
> schema = pa.schema({"key" : pa.string()})
> table = pa.Table.from_pydict(data, schema = schema)
> print("now writing out test file")
> pq.write_table(table, "test.parquet")
> {code}
> This results in a segfault when writing the table. Running
>
> {code:java}
> gdb -ex r --args python test.py
> {code}
> Yields
>
>
> {noformat}
> Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
> {noformat}
>
>
> Thanks for all of your arrow work,
> Josh
--
This message was sent by Atlassian Jira
(v8.3.4#803005)