You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Neemesh (JIRA)" <ji...@apache.org> on 2016/05/31 12:36:13 UTC

[jira] [Updated] (MAPREDUCE-6707) Running a mapreduce Job for updating Hbase table. data we are getting from txt/csv file 1.1 gb from hdfs location

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neemesh updated MAPREDUCE-6707:
-------------------------------
    Description: 
I am trying to read 1.1 gb file which has data and i am reading this file and based on the data doing a put to Hbase table , mapper is working fine but reducer is doing till 85% and fails with following error

java.lang.OutOfMemoryError: Java heap space
        at java.io.DataOutputStream.<init>(DataOutputStream.java:204)
        at org.apache.jute.BinaryOutputArchive.getArchive(BinaryOutputArchive.java:38)
        at org.apache.zookeeper.ClientCnxn$Packet.createBB(ClientCnxn.java:282)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:115)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
16/05/31 03:37:50 WARN mapred.LocalJobRunner: job_local1952932043_0001
java.lang.OutOfMemoryError: Java heap space

following are the configuration i have given for the running this mapreduce job

configuration.set("mapreduce.child.java.opts", "-Xmx16g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit");

configuration.set("mapreduce.reduce.memory.mb", "8192m");

configuration.set("mapred.reduce.child.java.opts", "-Xmx10g");

I try reducing the file size by using some split logic. what i have notice if the file size is 350 Mb Mapreduce work file without issue, but if it is more than 350 Mb we will be getting OOM exception.

can some one please let us know what configuration is am missing or changes is required to run Mapreduce Job 

  was:
I am try to read 1.1 gb file which has data and i am reading this file and based on the data doing a put to Hbase table , mapper is working fine but reducer is doing till 85% and fails with following error

java.lang.OutOfMemoryError: Java heap space
        at java.io.DataOutputStream.<init>(DataOutputStream.java:204)
        at org.apache.jute.BinaryOutputArchive.getArchive(BinaryOutputArchive.java:38)
        at org.apache.zookeeper.ClientCnxn$Packet.createBB(ClientCnxn.java:282)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:115)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
16/05/31 03:37:50 WARN mapred.LocalJobRunner: job_local1952932043_0001
java.lang.OutOfMemoryError: Java heap space

following are the configuration i have given for the running this mapreduce job

configuration.set("mapreduce.child.java.opts", "-Xmx16g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit");

configuration.set("mapreduce.reduce.memory.mb", "8192m");

configuration.set("mapred.reduce.child.java.opts", "-Xmx10g");

I try reducing the file size by using some split logic. what i have notice if the file size is 350 Mb Mapreduce work file without issue, but if it is more than 350 Mb we will be getting OOM exception.

can some one please let us know what configuration is am missing or changes is required to run Mapreduce Job 


> Running a mapreduce Job for updating Hbase table. data we are getting from txt/csv file 1.1 gb from hdfs location
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6707
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6707
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Neemesh
>
> I am trying to read 1.1 gb file which has data and i am reading this file and based on the data doing a put to Hbase table , mapper is working fine but reducer is doing till 85% and fails with following error
> java.lang.OutOfMemoryError: Java heap space
>         at java.io.DataOutputStream.<init>(DataOutputStream.java:204)
>         at org.apache.jute.BinaryOutputArchive.getArchive(BinaryOutputArchive.java:38)
>         at org.apache.zookeeper.ClientCnxn$Packet.createBB(ClientCnxn.java:282)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:115)
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 16/05/31 03:37:50 WARN mapred.LocalJobRunner: job_local1952932043_0001
> java.lang.OutOfMemoryError: Java heap space
> following are the configuration i have given for the running this mapreduce job
> configuration.set("mapreduce.child.java.opts", "-Xmx16g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit");
> configuration.set("mapreduce.reduce.memory.mb", "8192m");
> configuration.set("mapred.reduce.child.java.opts", "-Xmx10g");
> I try reducing the file size by using some split logic. what i have notice if the file size is 350 Mb Mapreduce work file without issue, but if it is more than 350 Mb we will be getting OOM exception.
> can some one please let us know what configuration is am missing or changes is required to run Mapreduce Job 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org