You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Michal Turek (JIRA)" <ji...@apache.org> on 2016/02/24 10:43:18 UTC
[jira] [Updated] (PARQUET-544) ParquetWriter.close() throws
NullPointerException on second call, improper implementation of Closeable
contract
[ https://issues.apache.org/jira/browse/PARQUET-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michal Turek updated PARQUET-544:
---------------------------------
Description:
{{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.
It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.
{noformat}
java.lang.NullPointerException: null
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
{noformat}
{noformat}
private void flushRowGroupToStore()
throws IOException {
LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
LOG.warn("Too much memory used: " + columnStore.memUsageString());
}
if (recordCount > 0) {
parquetFileWriter.startBlock(recordCount);
columnStore.flush();
pageStore.flushToFileWriter(parquetFileWriter);
recordCount = 0;
parquetFileWriter.endBlock();
this.nextRowGroupSize = Math.min(
parquetFileWriter.getNextRowGroupSize(),
rowGroupSizeThreshold);
}
columnStore = null;
pageStore = null;
}
{noformat}
Known workaround is to prevent the second and other closes explicitly in the application code.
{noformat}
private final ParquetWriter<V> writer;
private boolean closed;
private void closeWriterOnlyOnce() throws IOException {
if (!closed) {
closed = true;
writer.close();
}
}
{noformat}
was:
{{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.
It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.
{noformat}
java.lang.NullPointerException: null
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
{noformat}
{noformat}
private void flushRowGroupToStore()
throws IOException {
LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
LOG.warn("Too much memory used: " + columnStore.memUsageString());
}
if (recordCount > 0) {
parquetFileWriter.startBlock(recordCount);
columnStore.flush();
pageStore.flushToFileWriter(parquetFileWriter);
recordCount = 0;
parquetFileWriter.endBlock();
this.nextRowGroupSize = Math.min(
parquetFileWriter.getNextRowGroupSize(),
rowGroupSizeThreshold);
}
columnStore = null;
pageStore = null;
}
{noformat}
Known workaround is to prevent the second and other closes explicitly in the application code.
{noformat}
private final ParquetWriter<V> writer;
private boolean closed;
private void closeWriterOnlyOnce() throws IOException {
if (!closed) {
writer.close();
closed = true;
}
}
{noformat}
> ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
> ---------------------------------------------------------------------------------------------------------------
>
> Key: PARQUET-544
> URL: https://issues.apache.org/jira/browse/PARQUET-544
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.8.1
> Reporter: Michal Turek
> Priority: Minor
>
> {{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.
> It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.
> {noformat}
> java.lang.NullPointerException: null
> at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> {noformat}
> {noformat}
> private void flushRowGroupToStore()
> throws IOException {
> LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
> if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
> LOG.warn("Too much memory used: " + columnStore.memUsageString());
> }
> if (recordCount > 0) {
> parquetFileWriter.startBlock(recordCount);
> columnStore.flush();
> pageStore.flushToFileWriter(parquetFileWriter);
> recordCount = 0;
> parquetFileWriter.endBlock();
> this.nextRowGroupSize = Math.min(
> parquetFileWriter.getNextRowGroupSize(),
> rowGroupSizeThreshold);
> }
> columnStore = null;
> pageStore = null;
> }
> {noformat}
> Known workaround is to prevent the second and other closes explicitly in the application code.
> {noformat}
> private final ParquetWriter<V> writer;
> private boolean closed;
> private void closeWriterOnlyOnce() throws IOException {
> if (!closed) {
> closed = true;
> writer.close();
> }
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)