You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Philip Felton (JIRA)" <ji...@apache.org> on 2018/08/09 09:37:00 UTC

[jira] [Created] (PARQUET-1374) Segfault on writing zero columns

Philip Felton created PARQUET-1374:
--------------------------------------

             Summary: Segfault on writing zero columns
                 Key: PARQUET-1374
                 URL: https://issues.apache.org/jira/browse/PARQUET-1374
             Project: Parquet
          Issue Type: Bug
            Reporter: Philip Felton


Here's a gist which reproduces it: [https://gist.github.com/philjdf/594ab431f135a040586aff08c7fb7666]
 # The problem starts with the call to ParquetFileWriter::Close().
 # As a result of that call, FileMetaDataBuilder::FileMetaDataBuilderImpl::Finish() gets called, which relies on metadata_ being non-null. At the end of that call Finish, it std::moves metadata_ somewhere else, setting it to null. So obviously it assumes it only gets called once.
 # Later on still inside Close(), FlatSchemaConverter::Convert() gets called, which throws an exception because we have no columns.
 # In handling this exception, we leave the try block, which destructs our ParquetFileWriter. This calls Close() again. This calls Finish() again, which now has a null metadata_ and segfaults.

So file_writer.cc FileSerializer::Close is presumably wrong, it should set is_open_ to false at the start rather than the end of the if block.

It's better to get an exception rather than a segfault, but ideally I'd like to write/read Parquet files with zero rows and/or zero columns. It means one less edge case for client code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)