You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Josh Weinstock (Jira)" <ji...@apache.org> on 2019/09/16 19:02:00 UTC
[jira] [Created] (ARROW-6573) Segfault when writing to parquet
Josh Weinstock created ARROW-6573:
-------------------------------------
Summary: Segfault when writing to parquet
Key: ARROW-6573
URL: https://issues.apache.org/jira/browse/ARROW-6573
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 0.14.1
Environment: Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7.
Reporter: Josh Weinstock
When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes.
Here is a reproducible example:
{code:java}
import pyarrow as pa
import pyarrow.parquet as pq
data = dict()
data["key"] = [0, 1, 2, 3] # segfault
#data["key"] = ["0", "1", "2", "3"] # no segfault
schema = pa.schema({"key" : pa.string()})
table = pa.Table.from_pydict(data, schema = schema)
print("now writing out test file")
pq.write_table(table, "test.parquet")
{code}
This results in a segfault when writing the table. Running
{code:java}
gdb -ex r --args python test.py
{code}
Yields
{noformat}
Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
{noformat}
Thanks for all of your arrow work,
Josh
--
This message was sent by Atlassian Jira
(v8.3.2#803003)