You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Josh Weinstock (Jira)" <ji...@apache.org> on 2019/09/16 19:02:00 UTC

[jira] [Created] (ARROW-6573) Segfault when writing to parquet

Josh Weinstock created ARROW-6573:
-------------------------------------

             Summary: Segfault when writing to parquet
                 Key: ARROW-6573
                 URL: https://issues.apache.org/jira/browse/ARROW-6573
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.14.1
         Environment: Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7. 
            Reporter: Josh Weinstock


When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes. 

Here is a reproducible example:

 
{code:java}
import pyarrow as pa
import pyarrow.parquet as pq

data = dict()
data["key"] = [0, 1, 2, 3] # segfault
#data["key"] = ["0", "1", "2", "3"] # no segfault

schema = pa.schema({"key" : pa.string()})

table = pa.Table.from_pydict(data, schema = schema)
print("now writing out test file")
pq.write_table(table, "test.parquet") 
{code}
This results in a segfault when writing the table. Running 

 
{code:java}
gdb -ex r --args python test.py 
{code}
Yields

 

 
{noformat}
Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
{noformat}
 

 

Thanks for all of your arrow work,

Josh



--
This message was sent by Atlassian Jira
(v8.3.2#803003)