You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/09/12 21:54:00 UTC

[jira] [Updated] (ARROW-13676) [C++] Coredump writing Arrow table to Parquet file

     [ https://issues.apache.org/jira/browse/ARROW-13676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Micah Kornfield updated ARROW-13676:
------------------------------------
    Affects Version/s:     (was: 6.0.0)

> [C++] Coredump writing Arrow table to Parquet file
> --------------------------------------------------
>
>                 Key: ARROW-13676
>                 URL: https://issues.apache.org/jira/browse/ARROW-13676
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Parquet
>    Affects Versions: 5.0.0, 4.0.1
>            Reporter: Shuai Zhang
>            Assignee: Micah Kornfield
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 5.0.1, 6.0.0
>
>         Attachments: callstack.txt
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> I'm suffering a random appeared coredump issue converting user data from Google Protobuf format to Apache Parquet file via Apache Arrow C++ project. The problem could be stable reproduced with ASAN check enabled for specified user data. The callstack from ASAN check is exactly same as the coredump callstack (posted in attachment file, compiled with apache-arrow-4.0.1 built without jemalloc). 
> I made some initial investigations:
> # The direct constructed Arrow table would trigger this issue. Clone it in different way would yield different result, despite all of them are equal via `table.Equals(other)` method. All of the tables `ValidateFull()` passed.
> ## Serialize then deserialize the table was safe.
> ## CombineChunks didn't help.
> ## Clone with TableBatchReader didn't help.
> ## CombineChunks or TableBatchReader cloning on deserialized table was still safe.
> # Different environment would trigger this problem, I think the issue is not related to glibc
> ## Debian 8 + gcc 4.9.2
> ## Debian 9 + gcc 6.3.0
> ## Debian 11 + gcc 10.2.1
> ## Ubuntu 20.04 LTS + clang 12.0.1
> Reproducing this issue by https://github.com/hcoona/arrow/commit/8fa6cdb0c756c17ea3edc43b7b73c717823bda85



--
This message was sent by Atlassian Jira
(v8.3.4#803005)