You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Reuben Kuhnert (JIRA)" <ji...@apache.org> on 2015/12/14 22:07:46 UTC
[jira] [Created] (PARQUET-406) Counter Initialization causes NPE

Reuben Kuhnert created PARQUET-406:
--------------------------------------

             Summary: Counter Initialization causes NPE
                 Key: PARQUET-406
                 URL: https://issues.apache.org/jira/browse/PARQUET-406
             Project: Parquet
          Issue Type: Bug
            Reporter: Reuben Kuhnert


{code}
CREATE EXTERNAL TABLE api_hit_parquet_test ROW FORMAT SERDE 'com.foursquare.hadoop.hive.serde.RecordV2SerDe' WITH SERDEPROPERTIES ('serialization.class' = 'com.foursquare.logs.gen.ApiHit') STORED AS INPUTFORMAT 'com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/bly/api_hit_parquet' TBLPROPERTIES ('thrift.parquetfile.input.format.thrift.class' = 'com.foursquare.logs.gen.ApiHit’)
{code}

The table is successfully created, and I can verify the schema is correct by running DESCRIBE FORMATTED on it. However, when I try to do a simple SELECT * on the table, I get the following stack trace:

{code}
java.io.IOException: java.lang.RuntimeException: Could not read first record (and it was not an EOF)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
Caused by: java.lang.RuntimeException: Could not read first record (and it was not an EOF)
        at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:280)
        at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.createValue(DeprecatedInputFormatWrapper.java:297)
        at com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat$$anon$1.<init>(HiveThriftParquetInputFormat.scala:47)
        at com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat.getRecordReader(HiveThriftParquetInputFormat.scala:46)
        at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
        ... 9 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://hadoop-alidoro-nn-vip/user/bly/api_hit_parquet/part-m-00000.parquet
        at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
        at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
        at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:271)
        ... 15 more
Caused by: java.lang.NullPointerException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.parquet.hadoop.util.ContextUtil.invoke(ContextUtil.java:264)
        at org.apache.parquet.hadoop.util.ContextUtil.incrementCounter(ContextUtil.java:273)
        at org.apache.parquet.hadoop.util.counters.mapreduce.MapReduceCounterAdapter.increment(MapReduceCounterAdapter.java:38)
        at org.apache.parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:78)
        at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:497)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
        ... 17 more
{code}

I have spent some time following this stack trace, and it appears that the error lies in the Counter code, which is odd because I don’t do anything with that. Is there some way I need to initialize counters?

To be specific, I have found that MapReduceCounterAdapter is being created with a null parameter. Here is the constructor:

{code}
public MapReduceCounterAdapter(Counter adaptee) {
    this.adaptee = adaptee;
  }
{code}

So adaptee is being passed as null, and then getting called later on, causing my NullPointerException.

The adaptee parameter is created by this method:

{code}
public static Counter getCounter(TaskInputOutputContext context,
                                   String groupName, String counterName) {
    return (Counter) invoke(GET_COUNTER_METHOD, context, groupName, counterName);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)