You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Stephen Bly <bl...@foursquare.com> on 2015/12/14 19:22:17 UTC

Error due to null Counter

Greetings Parquet developers. I am trying to create my own custom InputFormat for reading Parquet tables in Hive. This is how I create the table:

CREATE EXTERNAL TABLE api_hit_parquet_test ROW FORMAT SERDE 'com.foursquare.hadoop.hive.serde.RecordV2SerDe' WITH SERDEPROPERTIES ('serialization.class' = 'com.foursquare.logs.gen.ApiHit') STORED AS INPUTFORMAT 'com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/bly/api_hit_parquet' TBLPROPERTIES ('thrift.parquetfile.input.format.thrift.class' = 'com.foursquare.logs.gen.ApiHit’)

The table is successfully created, and I can verify the schema is correct by running DESCRIBE FORMATTED on it. However, when I try to do a simple SELECT * on the table, I get the following stack trace:

java.io.IOException: java.lang.RuntimeException: Could not read first record (and it was not an EOF)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
Caused by: java.lang.RuntimeException: Could not read first record (and it was not an EOF)
	at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:280)
	at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.createValue(DeprecatedInputFormatWrapper.java:297)
	at com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat$$anon$1.<init>(HiveThriftParquetInputFormat.scala:47)
	at com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat.getRecordReader(HiveThriftParquetInputFormat.scala:46)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
	... 9 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://hadoop-alidoro-nn-vip/user/bly/api_hit_parquet/part-m-00000.parquet
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
	at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
	at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:271)
	... 15 more
Caused by: java.lang.NullPointerException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.parquet.hadoop.util.ContextUtil.invoke(ContextUtil.java:264)
	at org.apache.parquet.hadoop.util.ContextUtil.incrementCounter(ContextUtil.java:273)
	at org.apache.parquet.hadoop.util.counters.mapreduce.MapReduceCounterAdapter.increment(MapReduceCounterAdapter.java:38)
	at org.apache.parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:78)
	at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:497)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
	... 17 more

I have spent some time following this stack trace, and it appears that the error lies in the Counter code, which is odd because I don’t do anything with that. Is there some way I need to initialize counters?

To be specific, I have found that MapReduceCounterAdapter is being created with a null parameter. Here is the constructor:

public MapReduceCounterAdapter(Counter adaptee) {
    this.adaptee = adaptee;
  }

So adaptee is being passed as null, and then getting called later on, causing my NullPointerException.

The adaptee parameter is created by this method:

public static Counter getCounter(TaskInputOutputContext context,
                                   String groupName, String counterName) {
    return (Counter) invoke(GET_COUNTER_METHOD, context, groupName, counterName);
  }

I am really quite stuck. Has anyone else has problems with this? Is there some code I need to add to get counters to work properly?

Re: Error due to null Counter

Posted by Reuben Kuhnert <re...@cloudera.com>.
Oh, and *groupName *missed that one.

Thanks

On Tue, Dec 15, 2015 at 9:58 AM, Reuben Kuhnert <reuben.kuhnert@cloudera.com
> wrote:

> Hi again,
>
> So I'm looking into your issue and I'm wondering if you can't send me a
> few pieces of information.
>
> (1) Can I get a stacktrace when this line is called?
>
>   public MapReduceCounterAdapter(Counter adaptee) {
> *    this.adaptee = adaptee;*
>   }
>
> (2) Can you send me information about this:
>
>   @Override
>   public ICounter getCounterByNameAndFlag(String groupName, String
> counterName, String counterFlag) {
>     if (ContextUtil.getConfiguration(context).getBoolean(counterFlag,
> true)) {
>       *return new MapReduceCounterAdapter(ContextUtil.getCounter(context,
> groupName, counterName));*
>     } else {
>       return new BenchmarkCounter.NullCounter();
>     }
>
> In particular, what is *context*, *counterName* and *counterFlag*?
>
> Thanks
>
> On Mon, Dec 14, 2015 at 3:18 PM, Stephen Bly <st...@gmail.com>
> wrote:
>
>> Thanks so much for looking into this! I’m pretty sure the issue is on my
>> end and not in the Parquet/Hive code (I’m rather inexperienced in the world
>> of Big Data and Hadoop in particular). But the error message is a little
>> obscure so I can’t figure out what I’m doing wrong to fix it.
>>
>> Let me know if you need to see any more of my code to help you
>> investigate this.
>
>
>

Re: Error due to null Counter

Posted by Reuben Kuhnert <re...@cloudera.com>.
Hi again,

So I'm looking into your issue and I'm wondering if you can't send me a few
pieces of information.

(1) Can I get a stacktrace when this line is called?

  public MapReduceCounterAdapter(Counter adaptee) {
*    this.adaptee = adaptee;*
  }

(2) Can you send me information about this:

  @Override
  public ICounter getCounterByNameAndFlag(String groupName, String
counterName, String counterFlag) {
    if (ContextUtil.getConfiguration(context).getBoolean(counterFlag,
true)) {
      *return new MapReduceCounterAdapter(ContextUtil.getCounter(context,
groupName, counterName));*
    } else {
      return new BenchmarkCounter.NullCounter();
    }

In particular, what is *context*, *counterName* and *counterFlag*?

Thanks

On Mon, Dec 14, 2015 at 3:18 PM, Stephen Bly <st...@gmail.com> wrote:

> Thanks so much for looking into this! I’m pretty sure the issue is on my
> end and not in the Parquet/Hive code (I’m rather inexperienced in the world
> of Big Data and Hadoop in particular). But the error message is a little
> obscure so I can’t figure out what I’m doing wrong to fix it.
>
> Let me know if you need to see any more of my code to help you investigate
> this.

Re: Error due to null Counter

Posted by Stephen Bly <st...@gmail.com>.
Thanks so much for looking into this! I’m pretty sure the issue is on my end and not in the Parquet/Hive code (I’m rather inexperienced in the world of Big Data and Hadoop in particular). But the error message is a little obscure so I can’t figure out what I’m doing wrong to fix it.

Let me know if you need to see any more of my code to help you investigate this.

Re: Error due to null Counter

Posted by Reuben Kuhnert <re...@cloudera.com>.
Hi Stephen,

I created ticket: https://issues.apache.org/jira/browse/PARQUET-406 to
track your issue. We'll take a look to track down your issue and then get
back to you.

Thanks, and let us know if you have any other questions.
Reuben

On Mon, Dec 14, 2015 at 12:22 PM, Stephen Bly <bl...@foursquare.com> wrote:

> Greetings Parquet developers. I am trying to create my own custom
> InputFormat for reading Parquet tables in Hive. This is how I create the
> table:
>
> CREATE EXTERNAL TABLE api_hit_parquet_test ROW FORMAT SERDE
> 'com.foursquare.hadoop.hive.serde.RecordV2SerDe' WITH SERDEPROPERTIES
> ('serialization.class' = 'com.foursquare.logs.gen.ApiHit') STORED AS
> INPUTFORMAT 'com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION '/user/bly/api_hit_parquet' TBLPROPERTIES
> ('thrift.parquetfile.input.format.thrift.class' =
> 'com.foursquare.logs.gen.ApiHit’)
>
> The table is successfully created, and I can verify the schema is correct
> by running DESCRIBE FORMATTED on it. However, when I try to do a simple
> SELECT * on the table, I get the following stack trace:
>
> java.io.IOException: java.lang.RuntimeException: Could not read first
> record (and it was not an EOF)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
>         at
> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1657)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>         at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> Caused by: java.lang.RuntimeException: Could not read first record (and it
> was not an EOF)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:280)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.createValue(DeprecatedInputFormatWrapper.java:297)
>         at
> com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat$$anon$1.<init>(HiveThriftParquetInputFormat.scala:47)
>         at
> com.foursquare.hadoop.hive.io.HiveThriftParquetInputFormat.getRecordReader(HiveThriftParquetInputFormat.scala:46)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
>         ... 9 more
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read
> value at 0 in block -1 in file
> hdfs://hadoop-alidoro-nn-vip/user/bly/api_hit_parquet/part-m-00000.parquet
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>         at
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>         at
> com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:271)
>         ... 15 more
> Caused by: java.lang.NullPointerException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at
> org.apache.parquet.hadoop.util.ContextUtil.invoke(ContextUtil.java:264)
>         at
> org.apache.parquet.hadoop.util.ContextUtil.incrementCounter(ContextUtil.java:273)
>         at
> org.apache.parquet.hadoop.util.counters.mapreduce.MapReduceCounterAdapter.increment(MapReduceCounterAdapter.java:38)
>         at
> org.apache.parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:78)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:497)
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
>         at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>         ... 17 more
>
> I have spent some time following this stack trace, and it appears that the
> error lies in the Counter code, which is odd because I don’t do anything
> with that. Is there some way I need to initialize counters?
>
> To be specific, I have found that MapReduceCounterAdapter is being created
> with a null parameter. Here is the constructor:
>
> public MapReduceCounterAdapter(Counter adaptee) {
>     this.adaptee = adaptee;
>   }
>
> So adaptee is being passed as null, and then getting called later on,
> causing my NullPointerException.
>
> The adaptee parameter is created by this method:
>
> public static Counter getCounter(TaskInputOutputContext context,
>                                    String groupName, String counterName) {
>     return (Counter) invoke(GET_COUNTER_METHOD, context, groupName,
> counterName);
>   }
>
> I am really quite stuck. Has anyone else has problems with this? Is there
> some code I need to add to get counters to work properly?