You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Daniel Dai (Jira)" <ji...@apache.org> on 2021/01/19 00:11:01 UTC

[jira] [Created] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

Daniel Dai created PARQUET-1963:
-----------------------------------

             Summary: DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty
                 Key: PARQUET-1963
                 URL: https://issues.apache.org/jira/browse/PARQUET-1963
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Daniel Dai
            Assignee: Daniel Dai


A followup of PARQUET-1947, after the fix, when the first sub-split is empty in CombineFileInputFormat, there's a NPE:
{code}
Caused by: java.lang.NullPointerException
	at org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154)
	at org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73)
	at cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
	at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
	at org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160)
	at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
	at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
	... 10 more
{code}

The reason is CombineFileInputFormat will use the result of createValue of the first sub-split as the value container. Since the first sub-split is empty, the value container is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)