You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/09/06 19:42:00 UTC
[jira] [Comment Edited] (ARROW-3933) [Python] Segfault reading
Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924554#comment-16924554 ]
Wes McKinney edited comment on ARROW-3933 at 9/6/19 7:41 PM:
-------------------------------------------------------------
This is still core dumping. I'm going to take a look and see if I can figure it out
{code}
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff7805801 in __GI_abort () at abort.c:79
#2 0x00007fffa4a37eee in arrow::util::CerrLog::~CerrLog (this=0x7fff84003b60) at ../src/arrow/util/logging.cc:50
#3 0x00007fffa4a37f29 in arrow::util::CerrLog::~CerrLog (this=0x7fff84003b60) at ../src/arrow/util/logging.cc:44
#4 0x00007fffa4a37b40 in arrow::util::ArrowLog::~ArrowLog (this=0x7fff9e63d188) at ../src/arrow/util/logging.cc:228
#5 0x00007fffa2f8e0cc in parquet::arrow::StructReader::GetDefLevels (this=0x7fff8400b4b0, data=0x7fff9e63d328, length=0x7fff9e63d320)
at ../src/parquet/arrow/reader.cc:607
#6 0x00007fffa2f8d615 in parquet::arrow::StructReader::DefLevelsToNullArray (this=0x7fff8400b4b0, null_bitmap_out=0x7fff9e63d4b0, null_count_out=0x7fff9e63d4a8)
at ../src/parquet/arrow/reader.cc:561
#7 0x00007fffa2f8e75b in parquet::arrow::StructReader::NextBatch (this=0x7fff8400b4b0, records_to_read=1777, out=0x55555607cd60)
at ../src/parquet/arrow/reader.cc:647
#8 0x00007fffa2fa403b in parquet::arrow::FileReaderImpl::ReadSchemaField (this=0x555555eb6310, i=2, indices=..., row_groups=..., out_field=0x55555607cd20,
out=0x55555607cd60) at ../src/parquet/arrow/reader.cc:182
{code}
was (Author: wesmckinn):
This is core dumping. I'm going to take a look and see if I can figure it out
{code}
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff7805801 in __GI_abort () at abort.c:79
#2 0x00007fffa4a37eee in arrow::util::CerrLog::~CerrLog (this=0x7fff84003b60) at ../src/arrow/util/logging.cc:50
#3 0x00007fffa4a37f29 in arrow::util::CerrLog::~CerrLog (this=0x7fff84003b60) at ../src/arrow/util/logging.cc:44
#4 0x00007fffa4a37b40 in arrow::util::ArrowLog::~ArrowLog (this=0x7fff9e63d188) at ../src/arrow/util/logging.cc:228
#5 0x00007fffa2f8e0cc in parquet::arrow::StructReader::GetDefLevels (this=0x7fff8400b4b0, data=0x7fff9e63d328, length=0x7fff9e63d320)
at ../src/parquet/arrow/reader.cc:607
#6 0x00007fffa2f8d615 in parquet::arrow::StructReader::DefLevelsToNullArray (this=0x7fff8400b4b0, null_bitmap_out=0x7fff9e63d4b0, null_count_out=0x7fff9e63d4a8)
at ../src/parquet/arrow/reader.cc:561
#7 0x00007fffa2f8e75b in parquet::arrow::StructReader::NextBatch (this=0x7fff8400b4b0, records_to_read=1777, out=0x55555607cd60)
at ../src/parquet/arrow/reader.cc:647
#8 0x00007fffa2fa403b in parquet::arrow::FileReaderImpl::ReadSchemaField (this=0x555555eb6310, i=2, indices=..., row_groups=..., out_field=0x55555607cd20,
out=0x55555607cd60) at ../src/parquet/arrow/reader.cc:182
{code}
> [Python] Segfault reading Parquet files from GNOMAD
> ---------------------------------------------------
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
> Reporter: David Konerding
> Assignee: Wes McKinney
> Priority: Minor
> Labels: parquet
> Fix For: 0.15.0
>
> Attachments: part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x00007fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, unsigned long*) () from /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)