You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "e.birukov (JIRA)" <ji...@apache.org> on 2018/02/20 13:38:00 UTC
[jira] [Comment Edited] (PARQUET-860) ParquetWriter.getDataSize
NullPointerException after closed
[ https://issues.apache.org/jira/browse/PARQUET-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370053#comment-16370053 ]
e.birukov edited comment on PARQUET-860 at 2/20/18 1:37 PM:
------------------------------------------------------------
I get the same error:
This happens when I get the temporary unavailability of s3
....FileSystemException...
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:121)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:643)
when I call the close () method again
Caused by: java.lang.NullPointerException
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:162)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
I read data from the stream and write data using ParquetWriter. So this problem is critical. It causes data loss!
was (Author: e.birukov):
I get the same error
This happens when I get the temporary unavailability of s3
....FileSystemException...
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:121)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:643)
when I call the close () method again
Caused by: java.lang.NullPointerException
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:162)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
I read data from the stream and write data usin ParquetWriter. So this problem is critical. It causes data loss!
> ParquetWriter.getDataSize NullPointerException after closed
> -----------------------------------------------------------
>
> Key: PARQUET-860
> URL: https://issues.apache.org/jira/browse/PARQUET-860
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.9.0
> Environment: Linux prim 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux
> openjdk version "1.8.0_112"
> OpenJDK Runtime Environment (build 1.8.0_112-b15)
> OpenJDK 64-Bit Server VM (build 25.112-b15, mixed mode)
> Reporter: Mike Mintz
> Priority: Major
>
> When I run {{ParquetWriter.getDataSize()}}, it works normally. But after I call {{ParquetWriter.close()}}, subsequent calls to ParquetWriter.getDataSize result in a NullPointerException.
> {noformat}
> java.lang.NullPointerException
> at org.apache.parquet.hadoop.InternalParquetRecordWriter.getDataSize(InternalParquetRecordWriter.java:132)
> at org.apache.parquet.hadoop.ParquetWriter.getDataSize(ParquetWriter.java:314)
> at FileBufferState.getFileSizeInBytes(FileBufferState.scala:83)
> {noformat}
> The reason for the NPE appears to be in {{InternalParquetRecordWriter.getDataSize}}, where it assumes that {{columnStore}} is not null.
> But the {{close()}} method calls {{flushRowGroupToStore()}} which sets {{columnStore = null}}.
> I'm guessing that once the file is closed, we can just return {{lastRowGroupEndPos}} since there should be no more buffered data, but I don't fully understand how this class works.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)