You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Dima Machlin <Di...@pursway.com> on 2014/06/23 10:18:27 UTC

Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException

Hello,
We are running Hive 0.12 and using the hive.auto.convert.join feature when :
hive.auto.convert.join.noconditionaltask.size = 50000000
hive.mapjoin.followby.gby.localtask.max.memory.usage = 0.7

The query is a mapjoin with a group by afterwards like so :

select id,x,max(y)
from (
select t1.id,t1.x,t2.y from  tbl1  join tbl2 on (t1.id=t2.id)
            ) z
group by id,x;


While executing a join to a table that has ~3m rows we are failing on :

org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2014-06-10 04:42:21    Processing rows:        2500000 Hashtable size: 2499999 Memory usage:704765184        percentage:     0.701
        at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)

This is understood as we pass the 70% limit.
But, the table only takes 35mb in the HDFS and somehow reading it to the hash table increases it size drastically when in the end it fails after reaching ~700mb.

So this is the first question - why does it take so much space in memory?

Later, i tried to increase hive.mapjoin.followby.gby.localtask.max.memory.usage to allow the mapjoin to finish. By doing so i got another problem.
The table is in fact loaded to memory as seen here :

Processing rows:        2900000 Hashtable size: 2899999 Memory usage:   818590784       percentage:     0.815
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Processing rows:        2900000 Hashtable size: 2899999 Memory usage:   818590784       percentage:   0.815
INFO exec.TableScanOperator: 0 finished. closing...
INFO exec.TableScanOperator: 0 forwarded 2946773 rows
INFO exec.HashTableSinkOperator: 1 finished. closing...
INFO exec.HashTableSinkOperator: Temp URI for side table: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2
Dump the side-table into file: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Dump the side-table into file: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
Upload 1 File to: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:45  Upload 1 File to: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.TableScanOperator: 0 Close done
End of local task; Time Taken: 10.745 sec.

But then, the join stage hangs for long time and fails on OOM.

>From the logs, i can see that it hangs on this line :

2014-05-28 12:16:58,229 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator: ******* Load from HashTable File: input : maprfs:/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10003/000000_0
2014-05-28 12:16:58,230 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator:           Load back 1 hashtable file from tmp file uri:/tmp/mapr-hadoop/mapred/local/taskTracker/hadoop/distcache/-479500712399318067_367753608_1109273133/maprfs/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10005/HashTable-Stage-2/Stage-2.tar.gz/MapJoin-mapfile691--.hashtable
2014-05-28 12:18:31,302 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...

It hangs on "Load back 1 hashtable file from tmp" for 1:33 minutes and then we get the exception :


2014-05-28 12:18:31,304 WARN org.apache.hadoop.ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (47) connection to /127.0.0.1:48520 from job_201405191528_9910,5,main]
java.lang.OutOfMemoryError: Java heap space
                at java.lang.StringBuffer.toString(StringBuffer.java:585)
                at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
                at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
                at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
                at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
2014-05-28 12:18:31,306 INFO org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Call to /127.0.0.1:48520 failed on local exception: java.io.IOException: Error reading responses
                at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
                at org.apache.hadoop.ipc.Client.call(Client.java:1098)
                at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275)
                at $Proxy0.ping(Unknown Source)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:680)
                at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Error reading responses
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:732)
Caused by: java.lang.OutOfMemoryError: Java heap space
                at java.lang.StringBuffer.toString(StringBuffer.java:585)
                at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
                at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
                at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
                at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)

Port 127.0.0.1:48520 is the tasktracker.

The file the local stage uploaded "MapJoin-mapfile691--.hashtable"  is only 87MB
The zip in which its located "Stage-2.tar.gz" is only 23MB

What's going on here? Why can the join continue successfully?

Last, i tried removing the group by from the query. After doing so, the query ends with no problem (setting hive.mapjoin.followby.gby.localtask.max.memory.usage more than 0.82)
No hangs or anything.

How can the group by effect the "Load back 1 hashtable file from tmp" step in any way?

Thanks in advance for any answers/comments.
-----------------------------------------------
[cid:image001.jpg@01CE92B5.CB034C90]
Dima Machlin, Big Data Architect
15 Abba Eban Blvd. PO Box 4125, Herzliya 46140 IL
P: +972-9-9518147 |M: +972-54-5671337|F: +972-9-9584736
Pursway.com<http://www.pursway.com/>

Re: Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException

Posted by Nagarjuna Vissarapu <na...@gmail.com>.

Can you please check your hdfs space once? If it is fine please increase
java heap memory in hive-env.sh file


On Mon, Jun 23, 2014 at 2:00 AM, Dima Machlin <Di...@pursway.com>
wrote:

>  I don't see how this is "same" or even remotely related to my issue.
>
> It would be better for you to send it with a different and informative
> subjects on a separate mail.
>
>
>
> *From:* Matouk IFTISSEN [mailto:matouk.iftissen@ysance.com]
> *Sent:* Monday, June 23, 2014 11:49 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException
>
>
>
> Hello,
>
> I have as the same problem, but in other manner
>
> the map 100 %
>
> the reduce 100% and then the reduce decrise in 75 % !!
>
> I use a lag function in hive, table  (my_first_table) has 15million rows :
>
>
>
>
>
>
>
>
> *INSERT INTO TABLE my_table select *, case when nouvelle_tache = '1' then
> 'pas de rejeu' else if (lag(opg_id,1) OVER (PARTITION BY opg_par_id order
> by date_execution) is null,  opg_id, lag(opg_id,1) OVER (PARTITION BY
> opg_par_id order by date_execution) ) end opg_par_id_1, others_columns from
> my_first_table*
>
> *--- to limit the number of row I have thought that is a memory proble
> but, not because I have a lot of free memory*
>
>
> *where column5 > '37123T0104-10510' and column5 <=  '69191R0025-10162'
> order bycolumn5 ;*
>
>
>
> no error in log , Please healp what is wrong ??
>
>  This the detail for tracker (full log):
>
>  Regards
>
>
>
> 2014-06-23 10:18 GMT+02:00 Dima Machlin <Di...@pursway.com>:
>
>  Hello,
>
> We are running Hive 0.12 and using the hive.auto.convert.join feature when
> :
>
> hive.auto.convert.join.noconditionaltask.size = 50000000
>
> hive.mapjoin.followby.gby.localtask.max.memory.usage = 0.7
>
>
>
> The query is a mapjoin with a group by afterwards like so :
>
>
>
> select id,x,max(y)
>
> from (
>
> select t1.id,t1.x,t2.y from  tbl1  join tbl2 on (t1.id=t2.id)
>
>             ) z
>
> group by id,x;
>
>
>
>
>
> While executing a join to a table that has ~3m rows we are failing on :
>
>
>
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException:
> 2014-06-10 04:42:21    Processing rows:        2500000 Hashtable size:
> 2499999 Memory usage:704765184        percentage:     0.701
>
>         at
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
>
>
>
> This is understood as we pass the 70% limit.
>
> But, the table only takes 35mb in the HDFS and somehow reading it to the
> hash table increases it size drastically when in the end it fails after
> reaching ~700mb.
>
>
>
> So this is the first question - why does it take so much space in memory?
>
>
>
> Later, i tried to increase
> hive.mapjoin.followby.gby.localtask.max.memory.usage to allow the mapjoin
> to finish. By doing so i got another problem.
>
> The table is in fact loaded to memory as seen here :
>
>
>
> Processing rows:        2900000 Hashtable size: 2899999 Memory usage:
> 818590784       percentage:     0.815
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Processing
> rows:        2900000 Hashtable size: 2899999 Memory usage:
> 818590784       percentage:   0.815
>
> INFO exec.TableScanOperator: 0 finished. closing...
>
> INFO exec.TableScanOperator: 0 forwarded 2946773 rows
>
> INFO exec.HashTableSinkOperator: 1 finished. closing...
>
> INFO exec.HashTableSinkOperator: Temp URI for side table:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2
>
> Dump the side-table into file:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Dump the side-table
> into file:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> Upload 1 File to:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:45  Upload 1 File to:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
>
> INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
>
> INFO exec.TableScanOperator: 0 Close done
>
> End of local task; Time Taken: 10.745 sec.
>
>
>
> *But then, the join stage hangs for long time and fails on OOM.*
>
>
>
> From the logs, i can see that it hangs on this line :
>
>
>
> 2014-05-28 12:16:58,229 INFO
> org.apache.hadoop.hive.ql.exec.MapJoinOperator: ******* Load from HashTable
> File: input :
> maprfs:/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10003/000000_0
>
> 2014-05-28 12:16:58,230 INFO
> org.apache.hadoop.hive.ql.exec.MapJoinOperator:           Load back 1
> hashtable file from tmp file
> uri:/tmp/mapr-hadoop/mapred/local/taskTracker/hadoop/distcache/-479500712399318067_367753608_1109273133/maprfs/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10005/HashTable-Stage-2/Stage-2.tar.gz/MapJoin-mapfile691--.hashtable
>
> 2014-05-28 12:18:31,302 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6
> finished. closing...
>
>
>
> It hangs on "Load back 1 hashtable file from tmp" for 1:33 minutes and
> then we get the exception :
>
>
>
>
>
> 2014-05-28 12:18:31,304 WARN org.apache.hadoop.ipc.Client: Unexpected
> error reading responses on connection Thread[IPC Client (47) connection to /
> 127.0.0.1:48520 from job_201405191528_9910,5,main]
>
> java.lang.OutOfMemoryError: Java heap space
>
>                 at java.lang.StringBuffer.toString(StringBuffer.java:585)
>
>                 at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
>
> 2014-05-28 12:18:31,306 INFO org.apache.hadoop.mapred.Task: Communication
> exception: java.io.IOException: Call to /127.0.0.1:48520 failed on local
> exception: java.io.IOException: Error reading responses
>
>                 at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
>
>                 at org.apache.hadoop.ipc.Client.call(Client.java:1098)
>
>                 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275)
>
>                 at $Proxy0.ping(Unknown Source)
>
>                 at
> org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:680)
>
>                 at java.lang.Thread.run(Thread.java:662)
>
> Caused by: java.io.IOException: Error reading responses
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:732)
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>                 at java.lang.StringBuffer.toString(StringBuffer.java:585)
>
>                 at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
>
>
>
> Port 127.0.0.1:48520 is the tasktracker.
>
>
>
> The file the local stage uploaded "MapJoin-mapfile691--.hashtable"  is
> only 87MB
>
> The zip in which its located "Stage-2.tar.gz" is only 23MB
>
>
>
> *What's going on here? Why can the join continue successfully? *
>
>
>
> Last, i tried removing the group by from the query. After doing so, the
> query ends with no problem (setting
> hive.mapjoin.followby.gby.localtask.max.memory.usage more than 0.82)
>
> No hangs or anything.
>
>
>
> *How can the group by effect the "Load back 1 hashtable file from tmp"
> step in any way?*
>
>
>
> Thanks in advance for any answers/comments.
>
> -----------------------------------------------
>
> [image: cid:image001.jpg@01CE92B5.CB034C90]
>
> *Dima Machlin, **Big Data Architect*
>
> 15 Abba Eban Blvd. PO Box 4125, Herzliya 46140 IL
>
> P: +972-9-9518147 |M: +972-54-5671337|F: +972-9-9584736
>
> *Pursway.com* <http://www.pursway.com/>
>
>
>
>
>
>
> --
>
> *Matouk IFTISSEN | Consultant BI & Big Data*
> * [image: http://www.ysance.com] *
> 24 rue du sentier - 75002 Paris - www.ysance.com
> Fax : +33 1 73 72 97 26
> *Ysance sur* :*Twitter* <http://twitter.com/ysance>* | Facebook
> <https://www.facebook.com/pages/Ysance/131036788697> | Google+
> <https://plus.google.com/u/0/b/115710923959357341736/115710923959357341736/posts> | LinkedIn
> <http://www.linkedin.com/company/ysance> | Newsletter
> <http://www.ysance.com/nous-contacter.html>*
> *Nos autres sites* : *ys4you* <http://wwww.ys4you.com/>* | labdecisionnel
> <http://www.labdecisionnel.com/> | decrypt <http://decrypt.ysance.com/>*
>
>
>
>
>
> ************************************************************************************
> This footnote confirms that this email message has been scanned by
> PineApp Mail-SeCure for the presence of malicious code, vandals & computer
> viruses.
>
> ************************************************************************************
>



-- 
With Thanks & Regards
Nagarjuna Vissarapu
9052179339

RE: Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException

Posted by Dima Machlin <Di...@pursway.com>.

I don’t see how this is “same” or even remotely related to my issue.
It would be better for you to send it with a different and informative subjects on a separate mail.

From: Matouk IFTISSEN [mailto:matouk.iftissen@ysance.com]
Sent: Monday, June 23, 2014 11:49 AM
To: user@hive.apache.org
Subject: Re: Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException

Hello,
I have as the same problem, but in other manner
the map 100 %
the reduce 100% and then the reduce decrise in 75 % !!
I use a lag function in hive, table  (my_first_table) has 15million rows :

INSERT INTO TABLE my_table
select *,
case when nouvelle_tache = '1' then 'pas de rejeu'
else
if (lag(opg_id,1) OVER (PARTITION BY opg_par_id order by date_execution) is null,  opg_id, lag(opg_id,1) OVER (PARTITION BY opg_par_id order by date_execution) )
end opg_par_id_1,
others_columns
from my_first_table
--- to limit the number of row I have thought that is a memory proble but, not because I have a lot of free memory
where column5 > '37123T0104-10510' and column5 <=  '69191R0025-10162'
order bycolumn5 ;

no error in log , Please healp what is wrong ??

[X]
This the detail for tracker (full log):

[X]
Regards

2014-06-23 10:18 GMT+02:00 Dima Machlin <Di...@pursway.com>>:
Hello,
We are running Hive 0.12 and using the hive.auto.convert.join feature when :
hive.auto.convert.join.noconditionaltask.size = 50000000
hive.mapjoin.followby.gby.localtask.max.memory.usage = 0.7

The query is a mapjoin with a group by afterwards like so :

select id,x,max(y)
from (
select t1.id<http://t1.id>,t1.x,t2.y from  tbl1  join tbl2 on (t1.id<http://t1.id>=t2.id<http://t2.id>)
            ) z
group by id,x;

While executing a join to a table that has ~3m rows we are failing on :

org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2014-06-10 04:42:21    Processing rows:        2500000 Hashtable size: 2499999 Memory usage:704765184        percentage:     0.701
        at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)

This is understood as we pass the 70% limit.
But, the table only takes 35mb in the HDFS and somehow reading it to the hash table increases it size drastically when in the end it fails after reaching ~700mb.

So this is the first question – why does it take so much space in memory?

Later, i tried to increase hive.mapjoin.followby.gby.localtask.max.memory.usage to allow the mapjoin to finish. By doing so i got another problem.
The table is in fact loaded to memory as seen here :

Processing rows:        2900000 Hashtable size: 2899999 Memory usage:   818590784       percentage:     0.815
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Processing rows:        2900000 Hashtable size: 2899999 Memory usage:   818590784       percentage:   0.815
INFO exec.TableScanOperator: 0 finished. closing...
INFO exec.TableScanOperator: 0 forwarded 2946773 rows
INFO exec.HashTableSinkOperator: 1 finished. closing...
INFO exec.HashTableSinkOperator: Temp URI for side table: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2
Dump the side-table into file: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Dump the side-table into file: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
Upload 1 File to: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:45  Upload 1 File to: file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.TableScanOperator: 0 Close done
End of local task; Time Taken: 10.745 sec.

But then, the join stage hangs for long time and fails on OOM.

From the logs, i can see that it hangs on this line :

2014-05-28 12:16:58,229 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator: ******* Load from HashTable File: input : maprfs:/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10003/000000_0
2014-05-28 12:16:58,230 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator:           Load back 1 hashtable file from tmp file uri:/tmp/mapr-hadoop/mapred/local/taskTracker/hadoop/distcache/-479500712399318067_367753608_1109273133/maprfs/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10005/HashTable-Stage-2/Stage-2.tar.gz/MapJoin-mapfile691--.hashtable
2014-05-28 12:18:31,302 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...

It hangs on “Load back 1 hashtable file from tmp” for 1:33 minutes and then we get the exception :

2014-05-28 12:18:31,304 WARN org.apache.hadoop.ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (47) connection to /127.0.0.1:48520<http://127.0.0.1:48520> from job_201405191528_9910,5,main]
java.lang.OutOfMemoryError: Java heap space
                at java.lang.StringBuffer.toString(StringBuffer.java:585)
                at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
                at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
                at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
                at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
2014-05-28 12:18:31,306 INFO org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Call to /127.0.0.1:48520<http://127.0.0.1:48520> failed on local exception: java.io.IOException: Error reading responses
                at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
                at org.apache.hadoop.ipc.Client.call(Client.java:1098)
                at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275)
                at $Proxy0.ping(Unknown Source)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:680)
                at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Error reading responses
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:732)
Caused by: java.lang.OutOfMemoryError: Java heap space
                at java.lang.StringBuffer.toString(StringBuffer.java:585)
                at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
                at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
                at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
                at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)

Port 127.0.0.1:48520<http://127.0.0.1:48520> is the tasktracker.

The file the local stage uploaded “MapJoin-mapfile691--.hashtable”  is only 87MB
The zip in which its located “Stage-2.tar.gz” is only 23MB

What’s going on here? Why can the join continue successfully?

Last, i tried removing the group by from the query. After doing so, the query ends with no problem (setting hive.mapjoin.followby.gby.localtask.max.memory.usage more than 0.82)
No hangs or anything.

How can the group by effect the “Load back 1 hashtable file from tmp” step in any way?

Thanks in advance for any answers/comments.
-----------------------------------------------
[cid:image001.jpg@01CE92B5.CB034C90]
Dima Machlin, Big Data Architect
15 Abba Eban Blvd. PO Box 4125, Herzliya 46140 IL
P: +972-9-9518147<tel:%2B972-9-9518147> |M: +972-54-5671337<tel:%2B972-54-5671337>|F: +972-9-9584736<tel:%2B972-9-9584736>
Pursway.com<http://www.pursway.com/>

--
Matouk IFTISSEN | Consultant BI & Big Data
[http://www.ysance.com]
24 rue du sentier - 75002 Paris - www.ysance.com<http://www.ysance.com/>
Fax : +33 1 73 72 97 26
Ysance sur :Twitter<http://twitter.com/ysance> | Facebook<https://www.facebook.com/pages/Ysance/131036788697> | Google+<https://plus.google.com/u/0/b/115710923959357341736/115710923959357341736/posts> | LinkedIn<http://www.linkedin.com/company/ysance> | Newsletter<http://www.ysance.com/nous-contacter.html>
Nos autres sites : ys4you<http://wwww.ys4you.com/> | labdecisionnel<http://www.labdecisionnel.com/> | decrypt<http://decrypt.ysance.com/>

************************************************************************************
This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
************************************************************************************

Re: Hive 0.12 Mapjoin and MapJoinMemoryExhaustionException

Posted by Matouk IFTISSEN <ma...@ysance.com>.

Hello,
I have as the same problem, but in other manner
the map 100 %
the reduce 100% and then the reduce decrise in 75 % !!
I use a lag function in hive, table  (my_first_table) has 15million rows :









*INSERT INTO TABLE my_tableselect *,case when nouvelle_tache = '1' then
'pas de rejeu'elseif (lag(opg_id,1) OVER (PARTITION BY opg_par_id order by
date_execution) is null,  opg_id, lag(opg_id,1) OVER (PARTITION BY
opg_par_id order by date_execution) ) end opg_par_id_1,others_columnsfrom
my_first_table*

*--- to limit the number of row I have thought that is a memory proble but,
not because I have a lot of free memory *

*where column5 > '37123T0104-10510' and column5 <=  '69191R0025-10162'
order by**column5 ;*

no error in log , Please healp what is wrong ??



This the detail for tracker (full log):


Regards


2014-06-23 10:18 GMT+02:00 Dima Machlin <Di...@pursway.com>:

>  Hello,
>
> We are running Hive 0.12 and using the hive.auto.convert.join feature when
> :
>
> hive.auto.convert.join.noconditionaltask.size = 50000000
>
> hive.mapjoin.followby.gby.localtask.max.memory.usage = 0.7
>
>
>
> The query is a mapjoin with a group by afterwards like so :
>
>
>
> select id,x,max(y)
>
> from (
>
> select t1.id,t1.x,t2.y from  tbl1  join tbl2 on (t1.id=t2.id)
>
>             ) z
>
> group by id,x;
>
>
>
>
>
> While executing a join to a table that has ~3m rows we are failing on :
>
>
>
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException:
> 2014-06-10 04:42:21    Processing rows:        2500000 Hashtable size:
> 2499999 Memory usage:704765184        percentage:     0.701
>
>         at
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
>
>
>
> This is understood as we pass the 70% limit.
>
> But, the table only takes 35mb in the HDFS and somehow reading it to the
> hash table increases it size drastically when in the end it fails after
> reaching ~700mb.
>
>
>
> So this is the first question – why does it take so much space in memory?
>
>
>
> Later, i tried to increase
> hive.mapjoin.followby.gby.localtask.max.memory.usage to allow the mapjoin
> to finish. By doing so i got another problem.
>
> The table is in fact loaded to memory as seen here :
>
>
>
> Processing rows:        2900000 Hashtable size: 2899999 Memory usage:
> 818590784       percentage:     0.815
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Processing
> rows:        2900000 Hashtable size: 2899999 Memory usage:
> 818590784       percentage:   0.815
>
> INFO exec.TableScanOperator: 0 finished. closing...
>
> INFO exec.TableScanOperator: 0 forwarded 2946773 rows
>
> INFO exec.HashTableSinkOperator: 1 finished. closing...
>
> INFO exec.HashTableSinkOperator: Temp URI for side table:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2
>
> Dump the side-table into file:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42  Dump the side-table
> into file:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> Upload 1 File to:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 2014-05-28 12:16:45  Upload 1 File to:
> file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
>
> INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
>
> INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
>
> INFO exec.TableScanOperator: 0 Close done
>
> End of local task; Time Taken: 10.745 sec.
>
>
>
> *But then, the join stage hangs for long time and fails on OOM.*
>
>
>
> From the logs, i can see that it hangs on this line :
>
>
>
> 2014-05-28 12:16:58,229 INFO
> org.apache.hadoop.hive.ql.exec.MapJoinOperator: ******* Load from HashTable
> File: input :
> maprfs:/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10003/000000_0
>
> 2014-05-28 12:16:58,230 INFO
> org.apache.hadoop.hive.ql.exec.MapJoinOperator:           Load back 1
> hashtable file from tmp file
> uri:/tmp/mapr-hadoop/mapred/local/taskTracker/hadoop/distcache/-479500712399318067_367753608_1109273133/maprfs/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10005/HashTable-Stage-2/Stage-2.tar.gz/MapJoin-mapfile691--.hashtable
>
> 2014-05-28 12:18:31,302 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6
> finished. closing...
>
>
>
> It hangs on “Load back 1 hashtable file from tmp” for 1:33 minutes and
> then we get the exception :
>
>
>
>
>
> 2014-05-28 12:18:31,304 WARN org.apache.hadoop.ipc.Client: Unexpected
> error reading responses on connection Thread[IPC Client (47) connection to /
> 127.0.0.1:48520 from job_201405191528_9910,5,main]
>
> java.lang.OutOfMemoryError: Java heap space
>
>                 at java.lang.StringBuffer.toString(StringBuffer.java:585)
>
>                 at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
>
> 2014-05-28 12:18:31,306 INFO org.apache.hadoop.mapred.Task: Communication
> exception: java.io.IOException: Call to /127.0.0.1:48520 failed on local
> exception: java.io.IOException: Error reading responses
>
>                 at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
>
>                 at org.apache.hadoop.ipc.Client.call(Client.java:1098)
>
>                 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275)
>
>                 at $Proxy0.ping(Unknown Source)
>
>                 at
> org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:680)
>
>                 at java.lang.Thread.run(Thread.java:662)
>
> Caused by: java.io.IOException: Error reading responses
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:732)
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>                 at java.lang.StringBuffer.toString(StringBuffer.java:585)
>
>                 at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
>
>                 at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
>
>                 at
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
>
>
>
> Port 127.0.0.1:48520 is the tasktracker.
>
>
>
> The file the local stage uploaded “MapJoin-mapfile691--.hashtable”  is
> only 87MB
>
> The zip in which its located “Stage-2.tar.gz” is only 23MB
>
>
>
> *What’s going on here? Why can the join continue successfully? *
>
>
>
> Last, i tried removing the group by from the query. After doing so, the
> query ends with no problem (setting
> hive.mapjoin.followby.gby.localtask.max.memory.usage more than 0.82)
>
> No hangs or anything.
>
>
>
> *How can the group by effect the “Load back 1 hashtable file from tmp”
> step in any way?*
>
>
>
> Thanks in advance for any answers/comments.
>
> -----------------------------------------------
>
> [image: cid:image001.jpg@01CE92B5.CB034C90]
>
> *Dima Machlin, **Big Data Architect*
>
> 15 Abba Eban Blvd. PO Box 4125, Herzliya 46140 IL
>
> P: +972-9-9518147 |M: +972-54-5671337|F: +972-9-9584736
>
> *Pursway.com* <http://www.pursway.com/>
>
>
>



-- 

*Matouk IFTISSEN | Consultant BI & Big Data [image: http://www.ysance.com] *
24 rue du sentier - 75002 Paris - www.ysance.com <http://www.ysance.com/>
Fax : +33 1 73 72 97 26
*Ysance sur* :*Twitter* <http://twitter.com/ysance>* | Facebook
<https://www.facebook.com/pages/Ysance/131036788697> | Google+
<https://plus.google.com/u/0/b/115710923959357341736/115710923959357341736/posts>
| LinkedIn
<http://www.linkedin.com/company/ysance> | Newsletter
<http://www.ysance.com/nous-contacter.html>*
*Nos autres sites* : *ys4you* <http://wwww.ys4you.com/>* | labdecisionnel
<http://www.labdecisionnel.com/> | decrypt <http://decrypt.ysance.com/>*