You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Viral Bajaria <vi...@gmail.com> on 2011/04/08 05:06:32 UTC

debugging tips

Hello,

I have been trying to optimize one of my longer running queries using a
MAPJOIN hint. The query is fairly complex and it joins my base table (1+
billion rows) with multiple metadata tables (which are relatively small in
size).

I already use a STREAMTABLE hint for my large table and have provided
multiple MAPJOIN hints for each metadata table.

Everything is smooth sailing till the last step, each map/reduce step
finishes in under 10 minutes, it used to take 4+ hours prior to that. The
last step does not run even a single map successfully and fails with the
following exception:

2011-04-07 19:09:49,254 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.lang.RuntimeException
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:188)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:218)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
	... 4 more

I know this is more like a hadoop side exception but since the jar's
which run the job are auto-generated, I don't know what's the best
place to start debugging this issue. Any pointers ?

Also when adding multiple hint's do I just comma-separate them or is
there something else that I need to take care of i.e.

SELECT /*+ STREAMTABLE(t1), MAPJOIN(t2), MAPJOIN(t3), MAPJOIN(t4) */ .....

Thanks,

Viral