You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Michal Turek (JIRA)" <ji...@apache.org> on 2016/02/24 10:43:18 UTC

[jira] [Updated] (PARQUET-544) ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract

     [ https://issues.apache.org/jira/browse/PARQUET-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michal Turek updated PARQUET-544:
---------------------------------
    Description: 
{{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.

It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.

{noformat}
java.lang.NullPointerException: null
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
	at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
{noformat}

{noformat}
  private void flushRowGroupToStore()
      throws IOException {
    LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
    if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
      LOG.warn("Too much memory used: " + columnStore.memUsageString());
    }

    if (recordCount > 0) {
      parquetFileWriter.startBlock(recordCount);
      columnStore.flush();
      pageStore.flushToFileWriter(parquetFileWriter);
      recordCount = 0;
      parquetFileWriter.endBlock();
      this.nextRowGroupSize = Math.min(
          parquetFileWriter.getNextRowGroupSize(),
          rowGroupSizeThreshold);
    }

    columnStore = null;
    pageStore = null;
  }
{noformat}

Known workaround is to prevent the second and other closes explicitly in the application code.

{noformat}
    private final ParquetWriter<V> writer;
    private boolean closed;

    private void closeWriterOnlyOnce() throws IOException {
        if (!closed) {
            closed = true;
            writer.close();
        }
    }
{noformat}

  was:
{{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.

It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.

{noformat}
java.lang.NullPointerException: null
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
	at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
{noformat}

{noformat}
  private void flushRowGroupToStore()
      throws IOException {
    LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
    if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
      LOG.warn("Too much memory used: " + columnStore.memUsageString());
    }

    if (recordCount > 0) {
      parquetFileWriter.startBlock(recordCount);
      columnStore.flush();
      pageStore.flushToFileWriter(parquetFileWriter);
      recordCount = 0;
      parquetFileWriter.endBlock();
      this.nextRowGroupSize = Math.min(
          parquetFileWriter.getNextRowGroupSize(),
          rowGroupSizeThreshold);
    }

    columnStore = null;
    pageStore = null;
  }
{noformat}

Known workaround is to prevent the second and other closes explicitly in the application code.

{noformat}
    private final ParquetWriter<V> writer;
    private boolean closed;

    private void closeWriterOnlyOnce() throws IOException {
        if (!closed) {
            writer.close();
            closed = true;
        }
    }
{noformat}


> ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-544
>                 URL: https://issues.apache.org/jira/browse/PARQUET-544
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.8.1
>            Reporter: Michal Turek
>            Priority: Minor
>
> {{org.apache.parquet.hadoop.ParquetWriter}} implements {{java.util.Closeable}}, but its {{close()}} method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but {{ParquetWriter}} instead throws {{NullPointerException}}.
> It's source is quite obvious, {{columnStore}} is set to null and then accessed again. There is no "if already closed" condition to prevent it.
> {noformat}
> java.lang.NullPointerException: null
> 	at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> 	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> 	at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
> {noformat}
> {noformat}
>   private void flushRowGroupToStore()
>       throws IOException {
>     LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
>     if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
>       LOG.warn("Too much memory used: " + columnStore.memUsageString());
>     }
>     if (recordCount > 0) {
>       parquetFileWriter.startBlock(recordCount);
>       columnStore.flush();
>       pageStore.flushToFileWriter(parquetFileWriter);
>       recordCount = 0;
>       parquetFileWriter.endBlock();
>       this.nextRowGroupSize = Math.min(
>           parquetFileWriter.getNextRowGroupSize(),
>           rowGroupSizeThreshold);
>     }
>     columnStore = null;
>     pageStore = null;
>   }
> {noformat}
> Known workaround is to prevent the second and other closes explicitly in the application code.
> {noformat}
>     private final ParquetWriter<V> writer;
>     private boolean closed;
>     private void closeWriterOnlyOnce() throws IOException {
>         if (!closed) {
>             closed = true;
>             writer.close();
>         }
>     }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)