You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Hatem Helal (JIRA)" <ji...@apache.org> on 2019/04/18 14:23:00 UTC
[jira] [Commented] (PARQUET-1565) [C++] SEGV in FromParquetSchema
with corrupt file from PARQUET-1481
[ https://issues.apache.org/jira/browse/PARQUET-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821152#comment-16821152 ]
Hatem Helal commented on PARQUET-1565:
--------------------------------------
This is a somewhat esoteric problem but the fix seems to be to extend the switch case here [this switch case|https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/schema.cc#L174] to handle the corrupted thrift metadata.
> [C++] SEGV in FromParquetSchema with corrupt file from PARQUET-1481
> -------------------------------------------------------------------
>
> Key: PARQUET-1565
> URL: https://issues.apache.org/jira/browse/PARQUET-1565
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Affects Versions: cpp-1.6.0
> Reporter: Hatem Helal
> Assignee: Hatem Helal
> Priority: Minor
>
> Calling {{parquet::arrow::FromParquetSchema}} when reading the corrupt file attached to PARQUET-1481 results in a SEGV. I'm not sure when this was introduced but I didn't observe this problem with our app that uses parquet-cpp v1.4.0. Our team caught this while integrating Arrow 0.12.1 into MATLAB.
> To reproduce this, add the following lines to [parquet-reader.cc|https://github.com/apache/arrow/blob/master/cpp/tools/parquet/parquet-reader.cc#L66], build, and try to read the corrupt file attached to PARQUET-1481.
> {code:java}
> const auto parquet_schema = reader->metadata()->schema();
> std::shared_ptr<::arrow::Schema> arrow_schema;
> PARQUET_THROW_NOT_OK(parquet::arrow::FromParquetSchema(parquet_schema, &arrow_schema));{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)