You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2013/02/13 10:40:12 UTC

[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

    [ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577438#comment-13577438 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
-----------------------------------------------

The setup is as follows :

We have 7 dimension tables dim1,... dim7. Number of rows in each dimension - 1009530, 3, 227358, 238514, 519, 203841, 47.
and the query is 

{noformat}
 Select SUM(msr1), SUM(msr2) , ....
 from fact 
 Left outer join dim1 on  fact.d1= dim1.id
 Left outer join dim2 on  dim1.id2 = dim2.id
 Left outer Join dim3 on  fact.d3= dim3.id1
 Left outer Join dim4 on  dim3.id3= dim4.id4
 Left outer join dim5 on  dim4.id5= dim5.id
 Left outer Join dim6 on  dim3.id6= dim6.id
 Left outer Join dim7 on  dim6.id7 = dim7.id;
{noformat}

here is the log of lacal task loading hash tables, I'm seeing an NPE while loading one the tables :
{noformat}
2013-02-13 09:04:47	Starting to launch local task to process map join;	maximum memory = 1004929024
2013-02-13 09:04:48	Processing rows:	519	Hashtable size:	519	Memory usage:	11845496	rate:	0.012
2013-02-13 09:04:48	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile21--.hashtable
2013-02-13 09:04:48	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile21--.hashtable File size: 31191
2013-02-13 09:04:49	Processing rows:	200000	Hashtable size:	199999	Memory usage:	60980296	rate:	0.061
2013-02-13 09:04:54	Processing rows:	200000	Hashtable size:	199999	Memory usage:	156217016	rate:	0.155
2013-02-13 09:05:01	Processing rows:	300000	Hashtable size:	299999	Memory usage:	202205440	rate:	0.201
2013-02-13 09:05:05	Processing rows:	400000	Hashtable size:	399999	Memory usage:	260133024	rate:	0.259
2013-02-13 09:05:10	Processing rows:	500000	Hashtable size:	499999	Memory usage:	293007176	rate:	0.292
2013-02-13 09:05:14	Processing rows:	600000	Hashtable size:	599999	Memory usage:	347795184	rate:	0.346
2013-02-13 09:05:22	Processing rows:	700000	Hashtable size:	699999	Memory usage:	388323912	rate:	0.386
2013-02-13 09:05:28	Processing rows:	800000	Hashtable size:	799999	Memory usage:	453952824	rate:	0.452
2013-02-13 09:05:34	Processing rows:	900000	Hashtable size:	899999	Memory usage:	482001544	rate:	0.48
2013-02-13 09:05:43	Processing rows:	1000000	Hashtable size:	999999	Memory usage:	539703480	rate:	0.537
2013-02-13 09:05:47	Processing rows:	1009530	Hashtable size:	1009530	Memory usage:	530473664	rate:	0.528
2013-02-13 09:05:47	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile61--.hashtable
2013-02-13 09:06:29	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile61--.hashtable File size: 148246102
2013-02-13 09:06:31	Processing rows:	258054	Hashtable size:	54213	Memory usage:	111883448	rate:	0.111
2013-02-13 09:06:31	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile31--.hashtable
2013-02-13 09:06:33	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile31--.hashtable File size: 4251559
2013-02-13 09:06:34	Processing rows:	258054	Hashtable size:	203841	Memory usage:	72276192	rate:	0.072
2013-02-13 09:06:34	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile32--.hashtable
java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.writeExternal(MapJoinObjectValue.java:138)
	at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
	at java.util.HashMap.writeObject(HashMap.java:1018)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:959)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
	at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.flushMemoryCacheToPersistent(HashMapWrapper.java:116)
	at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.closeOp(HashTableSinkOperator.java:415)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:607)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616)
	at org.apache.hadoop.hive.ql.exec.MapredLocalTask.startForward(MapredLocalTask.java:324)
	at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:276)
	at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
2013-02-13 09:06:34	Processing rows:	47	Hashtable size:	47	Memory usage:	72554224	rate:	0.072
2013-02-13 09:06:34	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile11--.hashtable
2013-02-13 09:06:34	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile11--.hashtable File size: 2908
2013-02-13 09:06:37	Processing rows:	200000	Hashtable size:	199999	Memory usage:	154624680	rate:	0.154
2013-02-13 09:06:38	Processing rows:	227358	Hashtable size:	227358	Memory usage:	165643352	rate:	0.165
2013-02-13 09:06:38	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile41--.hashtable
2013-02-13 09:06:46	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile41--.hashtable File size: 34351618
2013-02-13 09:06:47	Processing rows:	3	Hashtable size:	3	Memory usage:	74456192	rate:	0.074
2013-02-13 09:06:47	Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile51--.hashtable
2013-02-13 09:06:47	Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile51--.hashtable File size: 457
2013-02-13 09:06:47	End of local task; Time Taken: 119.326 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin

{noformat}
                
> MapJoin failing with Distributed Cache error
> --------------------------------------------
>
>                 Key: HIVE-4018
>                 URL: https://issues.apache.org/jira/browse/HIVE-4018
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.11.0
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.11.0
>
>
> When I'm a running a star join query after HIVE-3784, it is failing with following error:
> 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error
> 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
> 	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
> 	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira