You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by nagarjuna kanamarlapudi <na...@gmail.com> on 2013/03/24 11:10:53 UTC

Child JVM memory allocation / Usage

Hi,

I configured  my child jvm heap to 2 GB. So, I thought I could really read
1.5GB of data and store it in memory (mapper/reducer).

I wanted to confirm the same and wrote the following piece of code in the
configure method of mapper.

@Override

public void configure(JobConf job) {

System.out.println("FREE MEMORY -- "

+ Runtime.getRuntime().freeMemory());

System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());

}


Surprisingly the output was


FREE MEMORY -- 341854864  = 320 MB
MAX MEMORY ---1908932608  = 1.9 GB


I am just wondering what processes are taking up that extra 1.6GB of
heap which I configured for the child jvm heap.


Appreciate in helping me understand the scenario.



Regards

Nagarjuna K

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
I configure those in hadoop-env.sh so I'm not sure about your configuration.

You can check with things like jconsole, or if you're coding it
anyways it's the third memory call in runtime, i.e. totalMemory.

On 3/25/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Ted,
>
> As far as i can recollect, I onl configured these parameters
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx2048m</value>
>         <description>this number is the number of megabytes of memory that
> each mapper and each reducers will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
> <property>
>     <name>mapred.child.ulimit</name>
>     <value>3145728</value>
>         <description>this number is the number of kilobytes of memory that
> each mapper and each reducer will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
>
>
> On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:
>
>> did you set the min heap size == your max head size? if you didn't,
>> free memory only shows you the difference between used and commit, not
>> used and max.
>>
>> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> > the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" +
>> > Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> > heap which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>>
>>
>> --
>> Ted.
>>
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
I configure those in hadoop-env.sh so I'm not sure about your configuration.

You can check with things like jconsole, or if you're coding it
anyways it's the third memory call in runtime, i.e. totalMemory.

On 3/25/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Ted,
>
> As far as i can recollect, I onl configured these parameters
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx2048m</value>
>         <description>this number is the number of megabytes of memory that
> each mapper and each reducers will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
> <property>
>     <name>mapred.child.ulimit</name>
>     <value>3145728</value>
>         <description>this number is the number of kilobytes of memory that
> each mapper and each reducer will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
>
>
> On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:
>
>> did you set the min heap size == your max head size? if you didn't,
>> free memory only shows you the difference between used and commit, not
>> used and max.
>>
>> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> > the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" +
>> > Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> > heap which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>>
>>
>> --
>> Ted.
>>
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
I configure those in hadoop-env.sh so I'm not sure about your configuration.

You can check with things like jconsole, or if you're coding it
anyways it's the third memory call in runtime, i.e. totalMemory.

On 3/25/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Ted,
>
> As far as i can recollect, I onl configured these parameters
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx2048m</value>
>         <description>this number is the number of megabytes of memory that
> each mapper and each reducers will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
> <property>
>     <name>mapred.child.ulimit</name>
>     <value>3145728</value>
>         <description>this number is the number of kilobytes of memory that
> each mapper and each reducer will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
>
>
> On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:
>
>> did you set the min heap size == your max head size? if you didn't,
>> free memory only shows you the difference between used and commit, not
>> used and max.
>>
>> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> > the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" +
>> > Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> > heap which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>>
>>
>> --
>> Ted.
>>
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
I configure those in hadoop-env.sh so I'm not sure about your configuration.

You can check with things like jconsole, or if you're coding it
anyways it's the third memory call in runtime, i.e. totalMemory.

On 3/25/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Ted,
>
> As far as i can recollect, I onl configured these parameters
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx2048m</value>
>         <description>this number is the number of megabytes of memory that
> each mapper and each reducers will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
> <property>
>     <name>mapred.child.ulimit</name>
>     <value>3145728</value>
>         <description>this number is the number of kilobytes of memory that
> each mapper and each reducer will have available to use. If jobs start
> running out of heap space, this may need to be increased.</description>
> </property>
>
>
>
> On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:
>
>> did you set the min heap size == your max head size? if you didn't,
>> free memory only shows you the difference between used and commit, not
>> used and max.
>>
>> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> > the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" +
>> > Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> > heap which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>>
>>
>> --
>> Ted.
>>
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Ted,

As far as i can recollect, I onl configured these parameters

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
        <description>this number is the number of megabytes of memory that
each mapper and each reducers will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>

<property>
    <name>mapred.child.ulimit</name>
    <value>3145728</value>
        <description>this number is the number of kilobytes of memory that
each mapper and each reducer will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>



On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:

> did you set the min heap size == your max head size? if you didn't,
> free memory only shows you the difference between used and commit, not
> used and max.
>
> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of
> > heap which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
>
>
> --
> Ted.
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Ted,

As far as i can recollect, I onl configured these parameters

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
        <description>this number is the number of megabytes of memory that
each mapper and each reducers will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>

<property>
    <name>mapred.child.ulimit</name>
    <value>3145728</value>
        <description>this number is the number of kilobytes of memory that
each mapper and each reducer will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>



On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:

> did you set the min heap size == your max head size? if you didn't,
> free memory only shows you the difference between used and commit, not
> used and max.
>
> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of
> > heap which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
>
>
> --
> Ted.
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Ted,

As far as i can recollect, I onl configured these parameters

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
        <description>this number is the number of megabytes of memory that
each mapper and each reducers will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>

<property>
    <name>mapred.child.ulimit</name>
    <value>3145728</value>
        <description>this number is the number of kilobytes of memory that
each mapper and each reducer will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>



On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:

> did you set the min heap size == your max head size? if you didn't,
> free memory only shows you the difference between used and commit, not
> used and max.
>
> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of
> > heap which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
>
>
> --
> Ted.
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Ted,

As far as i can recollect, I onl configured these parameters

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
        <description>this number is the number of megabytes of memory that
each mapper and each reducers will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>

<property>
    <name>mapred.child.ulimit</name>
    <value>3145728</value>
        <description>this number is the number of kilobytes of memory that
each mapper and each reducer will have available to use. If jobs start
running out of heap space, this may need to be increased.</description>
</property>



On Mon, Mar 25, 2013 at 6:57 AM, Ted <r6...@gmail.com> wrote:

> did you set the min heap size == your max head size? if you didn't,
> free memory only shows you the difference between used and commit, not
> used and max.
>
> On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of
> > heap which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
>
>
> --
> Ted.
>

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
did you set the min heap size == your max head size? if you didn't,
free memory only shows you the difference between used and commit, not
used and max.

On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of
> heap which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Awesome,

Working good .. need to start analysing why only 300MB is free out of
configured 1.9GB heap for mappers and reducers.


On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> >> "Dumping heap to ./heapdump.hprof"
>
> >> File myheapdump.hprof does not exist.
>
> The file names don't match - can you check your script / command line args.
>
> Thanks
> hemanth
>
>
> On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth,
>>
>> Nice to see this. I didnot know about this till now.
>>
>> But few one more issue.. the dump file did not get created..   The
>> following are the logs
>>
>>
>>
>> ttempt_201302211510_81218_m_000000_0:
>> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
>> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof
>> ...
>> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
>> bytes in 3.778 secs]
>> attempt_201302211510_81218_m_000000_0: #
>> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: #
>> -XX:OnOutOfMemoryError="./dump.sh"
>> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
>> "./dump.sh"...
>> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does
>> not exist.
>> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.hdfs.DFSClient).
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Couple of things to check:
>>>
>>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>>> interface ? You can look at an example at (
>>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>>> That's what accepts the -D params on command line. Alternatively, you can
>>> also set the same in the configuration object like this, in your launcher
>>> code:
>>>
>>> Configuration conf = new Configuration()
>>>
>>> conf.set("mapred.create.symlink", "yes");
>>>
>>>
>>>
>>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>>
>>>
>>>
>>> conf.set("mapred.child.java.opts",
>>>
>>>
>>>
>>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>>
>>>
>>> Second, the position of the arguments matters. I think the command
>>> should be
>>>
>>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth/Koji,
>>>>
>>>> Seems the above script doesn't work for me.  Can u look into the
>>>> following and suggest what more can I do
>>>>
>>>>
>>>>  hadoop fs -cat /user/ims-b/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>>
>>>>
>>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>>  -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>>
>>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>>
>>>>
>>>>
>>>> Erorr in the mapper :
>>>>
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> 	... 17 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>>> 	... 22 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Koji,
>>>>>
>>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>>> with your script today !
>>>>>
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>>
>>>>>> Create a dump.sh on hdfs.
>>>>>>
>>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>>> #!/bin/sh
>>>>>> hadoop dfs -put myheapdump.hprof
>>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>>
>>>>>> Run your job with
>>>>>>
>>>>>> -Dmapred.create.symlink=yes
>>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>>
>>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>>
>>>>>> Koji
>>>>>>
>>>>>>
>>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>>> the task is done. There are options to keep failed task files or task files
>>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>>> >
>>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>>> cluster I think.
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > If your task is running out of memory, you could add the option
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>>> >
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi hemanth,
>>>>>> >
>>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>>> by third party. I only have have a edge node through which I can submit the
>>>>>> jobs.
>>>>>> >
>>>>>> > Is there any other way of getting the dump instead of physically
>>>>>> going to that machine and  checking out.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > One option to find what could be taking the memory is to use jmap
>>>>>> on the running task. The steps I followed are:
>>>>>> >
>>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>>> the running task - ps -ef | grep <task attempt id>
>>>>>> > - On the same machine executed jmap -histo <pid>
>>>>>> >
>>>>>> > This will give you an idea of the count of objects allocated and
>>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>>> information, but this should help to get you started with debugging.
>>>>>> >
>>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>>> >
>>>>>> > Thanks
>>>>>> > hemanth
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>>> read the whole file and load it into list in the mapper.
>>>>>> >
>>>>>> >
>>>>>> > For each and every record Iook in this file which I got from
>>>>>> distributed cache.
>>>>>> >
>>>>>> > —
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> >
>>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>>> app will help
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi Hemanth,
>>>>>> >
>>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>>> threw java heap space error.
>>>>>> >
>>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > io.sort.mb = 256 MB
>>>>>> >
>>>>>> >
>>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>> >
>>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> > <na...@gmail.com> wrote:
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> > >
>>>>>> > > I wanted to confirm the same and wrote the following piece of
>>>>>> code in the
>>>>>> > > configure method of mapper.
>>>>>> > >
>>>>>> > > @Override
>>>>>> > >
>>>>>> > > public void configure(JobConf job) {
>>>>>> > >
>>>>>> > > System.out.println("FREE MEMORY -- "
>>>>>> > >
>>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>>> > >
>>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > Surprisingly the output was
>>>>>> > >
>>>>>> > >
>>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> > >
>>>>>> > >
>>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > > which I configured for the child jvm heap.
>>>>>> > >
>>>>>> > >
>>>>>> > > Appreciate in helping me understand the scenario.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Nagarjuna K
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Harsh J
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Awesome,

Working good .. need to start analysing why only 300MB is free out of
configured 1.9GB heap for mappers and reducers.


On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> >> "Dumping heap to ./heapdump.hprof"
>
> >> File myheapdump.hprof does not exist.
>
> The file names don't match - can you check your script / command line args.
>
> Thanks
> hemanth
>
>
> On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth,
>>
>> Nice to see this. I didnot know about this till now.
>>
>> But few one more issue.. the dump file did not get created..   The
>> following are the logs
>>
>>
>>
>> ttempt_201302211510_81218_m_000000_0:
>> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
>> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof
>> ...
>> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
>> bytes in 3.778 secs]
>> attempt_201302211510_81218_m_000000_0: #
>> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: #
>> -XX:OnOutOfMemoryError="./dump.sh"
>> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
>> "./dump.sh"...
>> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does
>> not exist.
>> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.hdfs.DFSClient).
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Couple of things to check:
>>>
>>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>>> interface ? You can look at an example at (
>>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>>> That's what accepts the -D params on command line. Alternatively, you can
>>> also set the same in the configuration object like this, in your launcher
>>> code:
>>>
>>> Configuration conf = new Configuration()
>>>
>>> conf.set("mapred.create.symlink", "yes");
>>>
>>>
>>>
>>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>>
>>>
>>>
>>> conf.set("mapred.child.java.opts",
>>>
>>>
>>>
>>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>>
>>>
>>> Second, the position of the arguments matters. I think the command
>>> should be
>>>
>>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth/Koji,
>>>>
>>>> Seems the above script doesn't work for me.  Can u look into the
>>>> following and suggest what more can I do
>>>>
>>>>
>>>>  hadoop fs -cat /user/ims-b/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>>
>>>>
>>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>>  -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>>
>>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>>
>>>>
>>>>
>>>> Erorr in the mapper :
>>>>
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> 	... 17 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>>> 	... 22 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Koji,
>>>>>
>>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>>> with your script today !
>>>>>
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>>
>>>>>> Create a dump.sh on hdfs.
>>>>>>
>>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>>> #!/bin/sh
>>>>>> hadoop dfs -put myheapdump.hprof
>>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>>
>>>>>> Run your job with
>>>>>>
>>>>>> -Dmapred.create.symlink=yes
>>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>>
>>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>>
>>>>>> Koji
>>>>>>
>>>>>>
>>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>>> the task is done. There are options to keep failed task files or task files
>>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>>> >
>>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>>> cluster I think.
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > If your task is running out of memory, you could add the option
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>>> >
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi hemanth,
>>>>>> >
>>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>>> by third party. I only have have a edge node through which I can submit the
>>>>>> jobs.
>>>>>> >
>>>>>> > Is there any other way of getting the dump instead of physically
>>>>>> going to that machine and  checking out.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > One option to find what could be taking the memory is to use jmap
>>>>>> on the running task. The steps I followed are:
>>>>>> >
>>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>>> the running task - ps -ef | grep <task attempt id>
>>>>>> > - On the same machine executed jmap -histo <pid>
>>>>>> >
>>>>>> > This will give you an idea of the count of objects allocated and
>>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>>> information, but this should help to get you started with debugging.
>>>>>> >
>>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>>> >
>>>>>> > Thanks
>>>>>> > hemanth
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>>> read the whole file and load it into list in the mapper.
>>>>>> >
>>>>>> >
>>>>>> > For each and every record Iook in this file which I got from
>>>>>> distributed cache.
>>>>>> >
>>>>>> > —
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> >
>>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>>> app will help
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi Hemanth,
>>>>>> >
>>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>>> threw java heap space error.
>>>>>> >
>>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > io.sort.mb = 256 MB
>>>>>> >
>>>>>> >
>>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>> >
>>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> > <na...@gmail.com> wrote:
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> > >
>>>>>> > > I wanted to confirm the same and wrote the following piece of
>>>>>> code in the
>>>>>> > > configure method of mapper.
>>>>>> > >
>>>>>> > > @Override
>>>>>> > >
>>>>>> > > public void configure(JobConf job) {
>>>>>> > >
>>>>>> > > System.out.println("FREE MEMORY -- "
>>>>>> > >
>>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>>> > >
>>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > Surprisingly the output was
>>>>>> > >
>>>>>> > >
>>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> > >
>>>>>> > >
>>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > > which I configured for the child jvm heap.
>>>>>> > >
>>>>>> > >
>>>>>> > > Appreciate in helping me understand the scenario.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Nagarjuna K
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Harsh J
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Awesome,

Working good .. need to start analysing why only 300MB is free out of
configured 1.9GB heap for mappers and reducers.


On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> >> "Dumping heap to ./heapdump.hprof"
>
> >> File myheapdump.hprof does not exist.
>
> The file names don't match - can you check your script / command line args.
>
> Thanks
> hemanth
>
>
> On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth,
>>
>> Nice to see this. I didnot know about this till now.
>>
>> But few one more issue.. the dump file did not get created..   The
>> following are the logs
>>
>>
>>
>> ttempt_201302211510_81218_m_000000_0:
>> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
>> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof
>> ...
>> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
>> bytes in 3.778 secs]
>> attempt_201302211510_81218_m_000000_0: #
>> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: #
>> -XX:OnOutOfMemoryError="./dump.sh"
>> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
>> "./dump.sh"...
>> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does
>> not exist.
>> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.hdfs.DFSClient).
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Couple of things to check:
>>>
>>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>>> interface ? You can look at an example at (
>>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>>> That's what accepts the -D params on command line. Alternatively, you can
>>> also set the same in the configuration object like this, in your launcher
>>> code:
>>>
>>> Configuration conf = new Configuration()
>>>
>>> conf.set("mapred.create.symlink", "yes");
>>>
>>>
>>>
>>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>>
>>>
>>>
>>> conf.set("mapred.child.java.opts",
>>>
>>>
>>>
>>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>>
>>>
>>> Second, the position of the arguments matters. I think the command
>>> should be
>>>
>>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth/Koji,
>>>>
>>>> Seems the above script doesn't work for me.  Can u look into the
>>>> following and suggest what more can I do
>>>>
>>>>
>>>>  hadoop fs -cat /user/ims-b/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>>
>>>>
>>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>>  -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>>
>>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>>
>>>>
>>>>
>>>> Erorr in the mapper :
>>>>
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> 	... 17 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>>> 	... 22 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Koji,
>>>>>
>>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>>> with your script today !
>>>>>
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>>
>>>>>> Create a dump.sh on hdfs.
>>>>>>
>>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>>> #!/bin/sh
>>>>>> hadoop dfs -put myheapdump.hprof
>>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>>
>>>>>> Run your job with
>>>>>>
>>>>>> -Dmapred.create.symlink=yes
>>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>>
>>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>>
>>>>>> Koji
>>>>>>
>>>>>>
>>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>>> the task is done. There are options to keep failed task files or task files
>>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>>> >
>>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>>> cluster I think.
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > If your task is running out of memory, you could add the option
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>>> >
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi hemanth,
>>>>>> >
>>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>>> by third party. I only have have a edge node through which I can submit the
>>>>>> jobs.
>>>>>> >
>>>>>> > Is there any other way of getting the dump instead of physically
>>>>>> going to that machine and  checking out.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > One option to find what could be taking the memory is to use jmap
>>>>>> on the running task. The steps I followed are:
>>>>>> >
>>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>>> the running task - ps -ef | grep <task attempt id>
>>>>>> > - On the same machine executed jmap -histo <pid>
>>>>>> >
>>>>>> > This will give you an idea of the count of objects allocated and
>>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>>> information, but this should help to get you started with debugging.
>>>>>> >
>>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>>> >
>>>>>> > Thanks
>>>>>> > hemanth
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>>> read the whole file and load it into list in the mapper.
>>>>>> >
>>>>>> >
>>>>>> > For each and every record Iook in this file which I got from
>>>>>> distributed cache.
>>>>>> >
>>>>>> > —
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> >
>>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>>> app will help
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi Hemanth,
>>>>>> >
>>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>>> threw java heap space error.
>>>>>> >
>>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > io.sort.mb = 256 MB
>>>>>> >
>>>>>> >
>>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>> >
>>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> > <na...@gmail.com> wrote:
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> > >
>>>>>> > > I wanted to confirm the same and wrote the following piece of
>>>>>> code in the
>>>>>> > > configure method of mapper.
>>>>>> > >
>>>>>> > > @Override
>>>>>> > >
>>>>>> > > public void configure(JobConf job) {
>>>>>> > >
>>>>>> > > System.out.println("FREE MEMORY -- "
>>>>>> > >
>>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>>> > >
>>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > Surprisingly the output was
>>>>>> > >
>>>>>> > >
>>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> > >
>>>>>> > >
>>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > > which I configured for the child jvm heap.
>>>>>> > >
>>>>>> > >
>>>>>> > > Appreciate in helping me understand the scenario.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Nagarjuna K
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Harsh J
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Awesome,

Working good .. need to start analysing why only 300MB is free out of
configured 1.9GB heap for mappers and reducers.


On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> >> "Dumping heap to ./heapdump.hprof"
>
> >> File myheapdump.hprof does not exist.
>
> The file names don't match - can you check your script / command line args.
>
> Thanks
> hemanth
>
>
> On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth,
>>
>> Nice to see this. I didnot know about this till now.
>>
>> But few one more issue.. the dump file did not get created..   The
>> following are the logs
>>
>>
>>
>> ttempt_201302211510_81218_m_000000_0:
>> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
>> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof
>> ...
>> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
>> bytes in 3.778 secs]
>> attempt_201302211510_81218_m_000000_0: #
>> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: #
>> -XX:OnOutOfMemoryError="./dump.sh"
>> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
>> "./dump.sh"...
>> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does
>> not exist.
>> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.hdfs.DFSClient).
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Couple of things to check:
>>>
>>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>>> interface ? You can look at an example at (
>>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>>> That's what accepts the -D params on command line. Alternatively, you can
>>> also set the same in the configuration object like this, in your launcher
>>> code:
>>>
>>> Configuration conf = new Configuration()
>>>
>>> conf.set("mapred.create.symlink", "yes");
>>>
>>>
>>>
>>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>>
>>>
>>>
>>> conf.set("mapred.child.java.opts",
>>>
>>>
>>>
>>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>>
>>>
>>> Second, the position of the arguments matters. I think the command
>>> should be
>>>
>>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth/Koji,
>>>>
>>>> Seems the above script doesn't work for me.  Can u look into the
>>>> following and suggest what more can I do
>>>>
>>>>
>>>>  hadoop fs -cat /user/ims-b/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>>
>>>>
>>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>>  -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>>
>>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>>
>>>>
>>>>
>>>> Erorr in the mapper :
>>>>
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> 	... 17 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>>> 	... 22 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Koji,
>>>>>
>>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>>> with your script today !
>>>>>
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>>
>>>>>> Create a dump.sh on hdfs.
>>>>>>
>>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>>> #!/bin/sh
>>>>>> hadoop dfs -put myheapdump.hprof
>>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>>
>>>>>> Run your job with
>>>>>>
>>>>>> -Dmapred.create.symlink=yes
>>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>>
>>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>>
>>>>>> Koji
>>>>>>
>>>>>>
>>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>>> the task is done. There are options to keep failed task files or task files
>>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>>> >
>>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>>> cluster I think.
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > If your task is running out of memory, you could add the option
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>>> >
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi hemanth,
>>>>>> >
>>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>>> by third party. I only have have a edge node through which I can submit the
>>>>>> jobs.
>>>>>> >
>>>>>> > Is there any other way of getting the dump instead of physically
>>>>>> going to that machine and  checking out.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > One option to find what could be taking the memory is to use jmap
>>>>>> on the running task. The steps I followed are:
>>>>>> >
>>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>>> the running task - ps -ef | grep <task attempt id>
>>>>>> > - On the same machine executed jmap -histo <pid>
>>>>>> >
>>>>>> > This will give you an idea of the count of objects allocated and
>>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>>> information, but this should help to get you started with debugging.
>>>>>> >
>>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>>> >
>>>>>> > Thanks
>>>>>> > hemanth
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>>> read the whole file and load it into list in the mapper.
>>>>>> >
>>>>>> >
>>>>>> > For each and every record Iook in this file which I got from
>>>>>> distributed cache.
>>>>>> >
>>>>>> > —
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> >
>>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>>> app will help
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi Hemanth,
>>>>>> >
>>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>>> threw java heap space error.
>>>>>> >
>>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > io.sort.mb = 256 MB
>>>>>> >
>>>>>> >
>>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>> >
>>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> > <na...@gmail.com> wrote:
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> > >
>>>>>> > > I wanted to confirm the same and wrote the following piece of
>>>>>> code in the
>>>>>> > > configure method of mapper.
>>>>>> > >
>>>>>> > > @Override
>>>>>> > >
>>>>>> > > public void configure(JobConf job) {
>>>>>> > >
>>>>>> > > System.out.println("FREE MEMORY -- "
>>>>>> > >
>>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>>> > >
>>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > Surprisingly the output was
>>>>>> > >
>>>>>> > >
>>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> > >
>>>>>> > >
>>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > > which I configured for the child jvm heap.
>>>>>> > >
>>>>>> > >
>>>>>> > > Appreciate in helping me understand the scenario.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Nagarjuna K
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Harsh J
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

>> "Dumping heap to ./heapdump.hprof"

>> File myheapdump.hprof does not exist.

The file names don't match - can you check your script / command line args.

Thanks
hemanth


On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> Nice to see this. I didnot know about this till now.
>
> But few one more issue.. the dump file did not get created..   The
> following are the logs
>
>
>
> ttempt_201302211510_81218_m_000000_0:
> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
> bytes in 3.778 secs]
> attempt_201302211510_81218_m_000000_0: #
> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
> "./dump.sh"...
> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
> exist.
> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
> found for logger (org.apache.hadoop.hdfs.DFSClient).
>
>
>
>
>
> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Couple of things to check:
>>
>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>> interface ? You can look at an example at (
>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>> That's what accepts the -D params on command line. Alternatively, you can
>> also set the same in the configuration object like this, in your launcher
>> code:
>>
>> Configuration conf = new Configuration()
>>
>> conf.set("mapred.create.symlink", "yes");
>>
>>
>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>
>>
>> conf.set("mapred.child.java.opts",
>>
>>
>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>
>>
>> Second, the position of the arguments matters. I think the command should
>> be
>>
>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>
>> Thanks
>> Hemanth
>>
>>
>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth/Koji,
>>>
>>> Seems the above script doesn't work for me.  Can u look into the
>>> following and suggest what more can I do
>>>
>>>
>>>  hadoop fs -cat /user/ims-b/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>
>>>
>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>  -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>>
>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>
>>>
>>>
>>> Erorr in the mapper :
>>>
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> 	... 17 more
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>> 	... 22 more
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Koji,
>>>>
>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>> with your script today !
>>>>
>>>> Hemanth
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>
>>>>> Create a dump.sh on hdfs.
>>>>>
>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>> #!/bin/sh
>>>>> hadoop dfs -put myheapdump.hprof
>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>
>>>>> Run your job with
>>>>>
>>>>> -Dmapred.create.symlink=yes
>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>
>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>
>>>>> Koji
>>>>>
>>>>>
>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>> the task is done. There are options to keep failed task files or task files
>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>> >
>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>> cluster I think.
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > If your task is running out of memory, you could add the option
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi hemanth,
>>>>> >
>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>> by third party. I only have have a edge node through which I can submit the
>>>>> jobs.
>>>>> >
>>>>> > Is there any other way of getting the dump instead of physically
>>>>> going to that machine and  checking out.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > One option to find what could be taking the memory is to use jmap on
>>>>> the running task. The steps I followed are:
>>>>> >
>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>> the running task - ps -ef | grep <task attempt id>
>>>>> > - On the same machine executed jmap -histo <pid>
>>>>> >
>>>>> > This will give you an idea of the count of objects allocated and
>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>> information, but this should help to get you started with debugging.
>>>>> >
>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>> >
>>>>> > Thanks
>>>>> > hemanth
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>> read the whole file and load it into list in the mapper.
>>>>> >
>>>>> >
>>>>> > For each and every record Iook in this file which I got from
>>>>> distributed cache.
>>>>> >
>>>>> > —
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> >
>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi Hemanth,
>>>>> >
>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>> threw java heap space error.
>>>>> >
>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>> that works ?
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > io.sort.mb = 256 MB
>>>>> >
>>>>> >
>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>> >
>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> > <na...@gmail.com> wrote:
>>>>> > > Hi,
>>>>> > >
>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>> really read
>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> > >
>>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>>> in the
>>>>> > > configure method of mapper.
>>>>> > >
>>>>> > > @Override
>>>>> > >
>>>>> > > public void configure(JobConf job) {
>>>>> > >
>>>>> > > System.out.println("FREE MEMORY -- "
>>>>> > >
>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>> > >
>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> > >
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > Surprisingly the output was
>>>>> > >
>>>>> > >
>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> > >
>>>>> > >
>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>> of heap
>>>>> > > which I configured for the child jvm heap.
>>>>> > >
>>>>> > >
>>>>> > > Appreciate in helping me understand the scenario.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > Regards
>>>>> > >
>>>>> > > Nagarjuna K
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Harsh J
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

>> "Dumping heap to ./heapdump.hprof"

>> File myheapdump.hprof does not exist.

The file names don't match - can you check your script / command line args.

Thanks
hemanth


On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> Nice to see this. I didnot know about this till now.
>
> But few one more issue.. the dump file did not get created..   The
> following are the logs
>
>
>
> ttempt_201302211510_81218_m_000000_0:
> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
> bytes in 3.778 secs]
> attempt_201302211510_81218_m_000000_0: #
> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
> "./dump.sh"...
> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
> exist.
> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
> found for logger (org.apache.hadoop.hdfs.DFSClient).
>
>
>
>
>
> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Couple of things to check:
>>
>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>> interface ? You can look at an example at (
>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>> That's what accepts the -D params on command line. Alternatively, you can
>> also set the same in the configuration object like this, in your launcher
>> code:
>>
>> Configuration conf = new Configuration()
>>
>> conf.set("mapred.create.symlink", "yes");
>>
>>
>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>
>>
>> conf.set("mapred.child.java.opts",
>>
>>
>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>
>>
>> Second, the position of the arguments matters. I think the command should
>> be
>>
>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>
>> Thanks
>> Hemanth
>>
>>
>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth/Koji,
>>>
>>> Seems the above script doesn't work for me.  Can u look into the
>>> following and suggest what more can I do
>>>
>>>
>>>  hadoop fs -cat /user/ims-b/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>
>>>
>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>  -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>>
>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>
>>>
>>>
>>> Erorr in the mapper :
>>>
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> 	... 17 more
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>> 	... 22 more
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Koji,
>>>>
>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>> with your script today !
>>>>
>>>> Hemanth
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>
>>>>> Create a dump.sh on hdfs.
>>>>>
>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>> #!/bin/sh
>>>>> hadoop dfs -put myheapdump.hprof
>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>
>>>>> Run your job with
>>>>>
>>>>> -Dmapred.create.symlink=yes
>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>
>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>
>>>>> Koji
>>>>>
>>>>>
>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>> the task is done. There are options to keep failed task files or task files
>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>> >
>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>> cluster I think.
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > If your task is running out of memory, you could add the option
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi hemanth,
>>>>> >
>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>> by third party. I only have have a edge node through which I can submit the
>>>>> jobs.
>>>>> >
>>>>> > Is there any other way of getting the dump instead of physically
>>>>> going to that machine and  checking out.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > One option to find what could be taking the memory is to use jmap on
>>>>> the running task. The steps I followed are:
>>>>> >
>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>> the running task - ps -ef | grep <task attempt id>
>>>>> > - On the same machine executed jmap -histo <pid>
>>>>> >
>>>>> > This will give you an idea of the count of objects allocated and
>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>> information, but this should help to get you started with debugging.
>>>>> >
>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>> >
>>>>> > Thanks
>>>>> > hemanth
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>> read the whole file and load it into list in the mapper.
>>>>> >
>>>>> >
>>>>> > For each and every record Iook in this file which I got from
>>>>> distributed cache.
>>>>> >
>>>>> > —
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> >
>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi Hemanth,
>>>>> >
>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>> threw java heap space error.
>>>>> >
>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>> that works ?
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > io.sort.mb = 256 MB
>>>>> >
>>>>> >
>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>> >
>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> > <na...@gmail.com> wrote:
>>>>> > > Hi,
>>>>> > >
>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>> really read
>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> > >
>>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>>> in the
>>>>> > > configure method of mapper.
>>>>> > >
>>>>> > > @Override
>>>>> > >
>>>>> > > public void configure(JobConf job) {
>>>>> > >
>>>>> > > System.out.println("FREE MEMORY -- "
>>>>> > >
>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>> > >
>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> > >
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > Surprisingly the output was
>>>>> > >
>>>>> > >
>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> > >
>>>>> > >
>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>> of heap
>>>>> > > which I configured for the child jvm heap.
>>>>> > >
>>>>> > >
>>>>> > > Appreciate in helping me understand the scenario.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > Regards
>>>>> > >
>>>>> > > Nagarjuna K
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Harsh J
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

>> "Dumping heap to ./heapdump.hprof"

>> File myheapdump.hprof does not exist.

The file names don't match - can you check your script / command line args.

Thanks
hemanth


On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> Nice to see this. I didnot know about this till now.
>
> But few one more issue.. the dump file did not get created..   The
> following are the logs
>
>
>
> ttempt_201302211510_81218_m_000000_0:
> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
> bytes in 3.778 secs]
> attempt_201302211510_81218_m_000000_0: #
> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
> "./dump.sh"...
> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
> exist.
> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
> found for logger (org.apache.hadoop.hdfs.DFSClient).
>
>
>
>
>
> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Couple of things to check:
>>
>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>> interface ? You can look at an example at (
>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>> That's what accepts the -D params on command line. Alternatively, you can
>> also set the same in the configuration object like this, in your launcher
>> code:
>>
>> Configuration conf = new Configuration()
>>
>> conf.set("mapred.create.symlink", "yes");
>>
>>
>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>
>>
>> conf.set("mapred.child.java.opts",
>>
>>
>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>
>>
>> Second, the position of the arguments matters. I think the command should
>> be
>>
>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>
>> Thanks
>> Hemanth
>>
>>
>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth/Koji,
>>>
>>> Seems the above script doesn't work for me.  Can u look into the
>>> following and suggest what more can I do
>>>
>>>
>>>  hadoop fs -cat /user/ims-b/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>
>>>
>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>  -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>>
>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>
>>>
>>>
>>> Erorr in the mapper :
>>>
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> 	... 17 more
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>> 	... 22 more
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Koji,
>>>>
>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>> with your script today !
>>>>
>>>> Hemanth
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>
>>>>> Create a dump.sh on hdfs.
>>>>>
>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>> #!/bin/sh
>>>>> hadoop dfs -put myheapdump.hprof
>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>
>>>>> Run your job with
>>>>>
>>>>> -Dmapred.create.symlink=yes
>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>
>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>
>>>>> Koji
>>>>>
>>>>>
>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>> the task is done. There are options to keep failed task files or task files
>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>> >
>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>> cluster I think.
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > If your task is running out of memory, you could add the option
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi hemanth,
>>>>> >
>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>> by third party. I only have have a edge node through which I can submit the
>>>>> jobs.
>>>>> >
>>>>> > Is there any other way of getting the dump instead of physically
>>>>> going to that machine and  checking out.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > One option to find what could be taking the memory is to use jmap on
>>>>> the running task. The steps I followed are:
>>>>> >
>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>> the running task - ps -ef | grep <task attempt id>
>>>>> > - On the same machine executed jmap -histo <pid>
>>>>> >
>>>>> > This will give you an idea of the count of objects allocated and
>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>> information, but this should help to get you started with debugging.
>>>>> >
>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>> >
>>>>> > Thanks
>>>>> > hemanth
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>> read the whole file and load it into list in the mapper.
>>>>> >
>>>>> >
>>>>> > For each and every record Iook in this file which I got from
>>>>> distributed cache.
>>>>> >
>>>>> > —
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> >
>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi Hemanth,
>>>>> >
>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>> threw java heap space error.
>>>>> >
>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>> that works ?
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > io.sort.mb = 256 MB
>>>>> >
>>>>> >
>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>> >
>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> > <na...@gmail.com> wrote:
>>>>> > > Hi,
>>>>> > >
>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>> really read
>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> > >
>>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>>> in the
>>>>> > > configure method of mapper.
>>>>> > >
>>>>> > > @Override
>>>>> > >
>>>>> > > public void configure(JobConf job) {
>>>>> > >
>>>>> > > System.out.println("FREE MEMORY -- "
>>>>> > >
>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>> > >
>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> > >
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > Surprisingly the output was
>>>>> > >
>>>>> > >
>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> > >
>>>>> > >
>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>> of heap
>>>>> > > which I configured for the child jvm heap.
>>>>> > >
>>>>> > >
>>>>> > > Appreciate in helping me understand the scenario.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > Regards
>>>>> > >
>>>>> > > Nagarjuna K
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Harsh J
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

>> "Dumping heap to ./heapdump.hprof"

>> File myheapdump.hprof does not exist.

The file names don't match - can you check your script / command line args.

Thanks
hemanth


On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> Nice to see this. I didnot know about this till now.
>
> But few one more issue.. the dump file did not get created..   The
> following are the logs
>
>
>
> ttempt_201302211510_81218_m_000000_0:
> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
> bytes in 3.778 secs]
> attempt_201302211510_81218_m_000000_0: #
> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
> heap space
> attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
> "./dump.sh"...
> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
> exist.
> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
> found for logger (org.apache.hadoop.hdfs.DFSClient).
>
>
>
>
>
> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Couple of things to check:
>>
>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>> interface ? You can look at an example at (
>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>> That's what accepts the -D params on command line. Alternatively, you can
>> also set the same in the configuration object like this, in your launcher
>> code:
>>
>> Configuration conf = new Configuration()
>>
>> conf.set("mapred.create.symlink", "yes");
>>
>>
>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>
>>
>> conf.set("mapred.child.java.opts",
>>
>>
>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>>
>>
>> Second, the position of the arguments matters. I think the command should
>> be
>>
>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>
>> Thanks
>> Hemanth
>>
>>
>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth/Koji,
>>>
>>> Seems the above script doesn't work for me.  Can u look into the
>>> following and suggest what more can I do
>>>
>>>
>>>  hadoop fs -cat /user/ims-b/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>
>>>
>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>  -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>>
>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>
>>>
>>>
>>> Erorr in the mapper :
>>>
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>> 	... 17 more
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>> 	... 22 more
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Koji,
>>>>
>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>> with your script today !
>>>>
>>>> Hemanth
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>>
>>>>> Create a dump.sh on hdfs.
>>>>>
>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>> #!/bin/sh
>>>>> hadoop dfs -put myheapdump.hprof
>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>
>>>>> Run your job with
>>>>>
>>>>> -Dmapred.create.symlink=yes
>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>
>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>
>>>>> Koji
>>>>>
>>>>>
>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>> like I suspected, the dump goes to the current work directory of the task
>>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>>> the task is done. There are options to keep failed task files or task files
>>>>> matching a pattern. However, these are NOT retaining the current working
>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>> >
>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>> cluster I think.
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > If your task is running out of memory, you could add the option
>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>> > to mapred.child.java.opts (along with the heap memory). However, I
>>>>> am not sure  where it stores the dump.. You might need to experiment a
>>>>> little on it.. Will try and send out the info if I get time to try out.
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi hemanth,
>>>>> >
>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>> by third party. I only have have a edge node through which I can submit the
>>>>> jobs.
>>>>> >
>>>>> > Is there any other way of getting the dump instead of physically
>>>>> going to that machine and  checking out.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > One option to find what could be taking the memory is to use jmap on
>>>>> the running task. The steps I followed are:
>>>>> >
>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>> > - Then on the machine where the map task is running, got the PID of
>>>>> the running task - ps -ef | grep <task attempt id>
>>>>> > - On the same machine executed jmap -histo <pid>
>>>>> >
>>>>> > This will give you an idea of the count of objects allocated and
>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>> information, but this should help to get you started with debugging.
>>>>> >
>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>>> >
>>>>> > Thanks
>>>>> > hemanth
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>>> read the whole file and load it into list in the mapper.
>>>>> >
>>>>> >
>>>>> > For each and every record Iook in this file which I got from
>>>>> distributed cache.
>>>>> >
>>>>> > —
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> >
>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > Hi Hemanth,
>>>>> >
>>>>> > I tried out your suggestion loading 420 MB file into memory. It
>>>>> threw java heap space error.
>>>>> >
>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>> that works ?
>>>>> >
>>>>> > Thanks
>>>>> > Hemanth
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>> > io.sort.mb = 256 MB
>>>>> >
>>>>> >
>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>> > The MapTask may consume some memory of its own as well. What is your
>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>> >
>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> > <na...@gmail.com> wrote:
>>>>> > > Hi,
>>>>> > >
>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>> really read
>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> > >
>>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>>> in the
>>>>> > > configure method of mapper.
>>>>> > >
>>>>> > > @Override
>>>>> > >
>>>>> > > public void configure(JobConf job) {
>>>>> > >
>>>>> > > System.out.println("FREE MEMORY -- "
>>>>> > >
>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>> > >
>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> > >
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > Surprisingly the output was
>>>>> > >
>>>>> > >
>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> > >
>>>>> > >
>>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>>> of heap
>>>>> > > which I configured for the child jvm heap.
>>>>> > >
>>>>> > >
>>>>> > > Appreciate in helping me understand the scenario.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > Regards
>>>>> > >
>>>>> > > Nagarjuna K
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Harsh J
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Sent from iPhone
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

Nice to see this. I didnot know about this till now.

But few one more issue.. the dump file did not get created..   The
following are the logs



ttempt_201302211510_81218_m_000000_0:
/data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
bytes in 3.778 secs]
attempt_201302211510_81218_m_000000_0: #
attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
"./dump.sh"...
attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
exist.
attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
found for logger (org.apache.hadoop.hdfs.DFSClient).





On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Couple of things to check:
>
> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
> interface ? You can look at an example at (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
> That's what accepts the -D params on command line. Alternatively, you can
> also set the same in the configuration object like this, in your launcher
> code:
>
> Configuration conf = new Configuration()
>
> conf.set("mapred.create.symlink", "yes");
>
> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>
> conf.set("mapred.child.java.opts",
>
>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>
>
> Second, the position of the arguments matters. I think the command should
> be
>
> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>
> Thanks
> Hemanth
>
>
> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth/Koji,
>>
>> Seems the above script doesn't work for me.  Can u look into the
>> following and suggest what more can I do
>>
>>
>>  hadoop fs -cat /user/ims-b/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>
>>
>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>  -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>>
>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>
>>
>>
>> Erorr in the mapper :
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> 	... 17 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>> 	at java.util.ArrayList.add(ArrayList.java:351)
>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>> 	... 22 more
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Koji,
>>>
>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>> with your script today !
>>>
>>> Hemanth
>>>
>>>
>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>
>>>> Create a dump.sh on hdfs.
>>>>
>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof
>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>
>>>> Run your job with
>>>>
>>>> -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>
>>>> Koji
>>>>
>>>>
>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>> like I suspected, the dump goes to the current work directory of the task
>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>> the task is done. There are options to keep failed task files or task files
>>>> matching a pattern. However, these are NOT retaining the current working
>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>> >
>>>> > You are effectively left with the jmap option on pseudo distributed
>>>> cluster I think.
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > If your task is running out of memory, you could add the option
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>>> not sure  where it stores the dump.. You might need to experiment a little
>>>> on it.. Will try and send out the info if I get time to try out.
>>>> >
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi hemanth,
>>>> >
>>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>>  But the real problem for me is, the cluster is being maintained by third
>>>> party. I only have have a edge node through which I can submit the jobs.
>>>> >
>>>> > Is there any other way of getting the dump instead of physically
>>>> going to that machine and  checking out.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > One option to find what could be taking the memory is to use jmap on
>>>> the running task. The steps I followed are:
>>>> >
>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>> distribution - effectively does nothing in the mapper / reducer).
>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>> > - Then on the machine where the map task is running, got the PID of
>>>> the running task - ps -ef | grep <task attempt id>
>>>> > - On the same machine executed jmap -histo <pid>
>>>> >
>>>> > This will give you an idea of the count of objects allocated and
>>>> size. Jmap also has options to get a dump, that will contain more
>>>> information, but this should help to get you started with debugging.
>>>> >
>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>> >
>>>> > Thanks
>>>> > hemanth
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>> read the whole file and load it into list in the mapper.
>>>> >
>>>> >
>>>> > For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>> >
>>>> > —
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> >
>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi Hemanth,
>>>> >
>>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>> >
>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>> it can. Can you just try reading in the data you want to read and see if
>>>> that works ?
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > io.sort.mb = 256 MB
>>>> >
>>>> >
>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>> > The MapTask may consume some memory of its own as well. What is your
>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>> >
>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> > <na...@gmail.com> wrote:
>>>> > > Hi,
>>>> > >
>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>> really read
>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>> > >
>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>> in the
>>>> > > configure method of mapper.
>>>> > >
>>>> > > @Override
>>>> > >
>>>> > > public void configure(JobConf job) {
>>>> > >
>>>> > > System.out.println("FREE MEMORY -- "
>>>> > >
>>>> > > + Runtime.getRuntime().freeMemory());
>>>> > >
>>>> > > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> > >
>>>> > > }
>>>> > >
>>>> > >
>>>> > > Surprisingly the output was
>>>> > >
>>>> > >
>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>> > >
>>>> > >
>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>> of heap
>>>> > > which I configured for the child jvm heap.
>>>> > >
>>>> > >
>>>> > > Appreciate in helping me understand the scenario.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Regards
>>>> > >
>>>> > > Nagarjuna K
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Harsh J
>>>> >
>>>> >
>>>> > --
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

Nice to see this. I didnot know about this till now.

But few one more issue.. the dump file did not get created..   The
following are the logs



ttempt_201302211510_81218_m_000000_0:
/data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
bytes in 3.778 secs]
attempt_201302211510_81218_m_000000_0: #
attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
"./dump.sh"...
attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
exist.
attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
found for logger (org.apache.hadoop.hdfs.DFSClient).





On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Couple of things to check:
>
> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
> interface ? You can look at an example at (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
> That's what accepts the -D params on command line. Alternatively, you can
> also set the same in the configuration object like this, in your launcher
> code:
>
> Configuration conf = new Configuration()
>
> conf.set("mapred.create.symlink", "yes");
>
> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>
> conf.set("mapred.child.java.opts",
>
>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>
>
> Second, the position of the arguments matters. I think the command should
> be
>
> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>
> Thanks
> Hemanth
>
>
> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth/Koji,
>>
>> Seems the above script doesn't work for me.  Can u look into the
>> following and suggest what more can I do
>>
>>
>>  hadoop fs -cat /user/ims-b/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>
>>
>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>  -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>>
>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>
>>
>>
>> Erorr in the mapper :
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> 	... 17 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>> 	at java.util.ArrayList.add(ArrayList.java:351)
>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>> 	... 22 more
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Koji,
>>>
>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>> with your script today !
>>>
>>> Hemanth
>>>
>>>
>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>
>>>> Create a dump.sh on hdfs.
>>>>
>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof
>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>
>>>> Run your job with
>>>>
>>>> -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>
>>>> Koji
>>>>
>>>>
>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>> like I suspected, the dump goes to the current work directory of the task
>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>> the task is done. There are options to keep failed task files or task files
>>>> matching a pattern. However, these are NOT retaining the current working
>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>> >
>>>> > You are effectively left with the jmap option on pseudo distributed
>>>> cluster I think.
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > If your task is running out of memory, you could add the option
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>>> not sure  where it stores the dump.. You might need to experiment a little
>>>> on it.. Will try and send out the info if I get time to try out.
>>>> >
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi hemanth,
>>>> >
>>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>>  But the real problem for me is, the cluster is being maintained by third
>>>> party. I only have have a edge node through which I can submit the jobs.
>>>> >
>>>> > Is there any other way of getting the dump instead of physically
>>>> going to that machine and  checking out.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > One option to find what could be taking the memory is to use jmap on
>>>> the running task. The steps I followed are:
>>>> >
>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>> distribution - effectively does nothing in the mapper / reducer).
>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>> > - Then on the machine where the map task is running, got the PID of
>>>> the running task - ps -ef | grep <task attempt id>
>>>> > - On the same machine executed jmap -histo <pid>
>>>> >
>>>> > This will give you an idea of the count of objects allocated and
>>>> size. Jmap also has options to get a dump, that will contain more
>>>> information, but this should help to get you started with debugging.
>>>> >
>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>> >
>>>> > Thanks
>>>> > hemanth
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>> read the whole file and load it into list in the mapper.
>>>> >
>>>> >
>>>> > For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>> >
>>>> > —
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> >
>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi Hemanth,
>>>> >
>>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>> >
>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>> it can. Can you just try reading in the data you want to read and see if
>>>> that works ?
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > io.sort.mb = 256 MB
>>>> >
>>>> >
>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>> > The MapTask may consume some memory of its own as well. What is your
>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>> >
>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> > <na...@gmail.com> wrote:
>>>> > > Hi,
>>>> > >
>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>> really read
>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>> > >
>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>> in the
>>>> > > configure method of mapper.
>>>> > >
>>>> > > @Override
>>>> > >
>>>> > > public void configure(JobConf job) {
>>>> > >
>>>> > > System.out.println("FREE MEMORY -- "
>>>> > >
>>>> > > + Runtime.getRuntime().freeMemory());
>>>> > >
>>>> > > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> > >
>>>> > > }
>>>> > >
>>>> > >
>>>> > > Surprisingly the output was
>>>> > >
>>>> > >
>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>> > >
>>>> > >
>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>> of heap
>>>> > > which I configured for the child jvm heap.
>>>> > >
>>>> > >
>>>> > > Appreciate in helping me understand the scenario.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Regards
>>>> > >
>>>> > > Nagarjuna K
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Harsh J
>>>> >
>>>> >
>>>> > --
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

Nice to see this. I didnot know about this till now.

But few one more issue.. the dump file did not get created..   The
following are the logs



ttempt_201302211510_81218_m_000000_0:
/data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
bytes in 3.778 secs]
attempt_201302211510_81218_m_000000_0: #
attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
"./dump.sh"...
attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
exist.
attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
found for logger (org.apache.hadoop.hdfs.DFSClient).





On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Couple of things to check:
>
> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
> interface ? You can look at an example at (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
> That's what accepts the -D params on command line. Alternatively, you can
> also set the same in the configuration object like this, in your launcher
> code:
>
> Configuration conf = new Configuration()
>
> conf.set("mapred.create.symlink", "yes");
>
> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>
> conf.set("mapred.child.java.opts",
>
>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>
>
> Second, the position of the arguments matters. I think the command should
> be
>
> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>
> Thanks
> Hemanth
>
>
> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth/Koji,
>>
>> Seems the above script doesn't work for me.  Can u look into the
>> following and suggest what more can I do
>>
>>
>>  hadoop fs -cat /user/ims-b/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>
>>
>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>  -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>>
>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>
>>
>>
>> Erorr in the mapper :
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> 	... 17 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>> 	at java.util.ArrayList.add(ArrayList.java:351)
>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>> 	... 22 more
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Koji,
>>>
>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>> with your script today !
>>>
>>> Hemanth
>>>
>>>
>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>
>>>> Create a dump.sh on hdfs.
>>>>
>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof
>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>
>>>> Run your job with
>>>>
>>>> -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>
>>>> Koji
>>>>
>>>>
>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>> like I suspected, the dump goes to the current work directory of the task
>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>> the task is done. There are options to keep failed task files or task files
>>>> matching a pattern. However, these are NOT retaining the current working
>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>> >
>>>> > You are effectively left with the jmap option on pseudo distributed
>>>> cluster I think.
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > If your task is running out of memory, you could add the option
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>>> not sure  where it stores the dump.. You might need to experiment a little
>>>> on it.. Will try and send out the info if I get time to try out.
>>>> >
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi hemanth,
>>>> >
>>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>>  But the real problem for me is, the cluster is being maintained by third
>>>> party. I only have have a edge node through which I can submit the jobs.
>>>> >
>>>> > Is there any other way of getting the dump instead of physically
>>>> going to that machine and  checking out.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > One option to find what could be taking the memory is to use jmap on
>>>> the running task. The steps I followed are:
>>>> >
>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>> distribution - effectively does nothing in the mapper / reducer).
>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>> > - Then on the machine where the map task is running, got the PID of
>>>> the running task - ps -ef | grep <task attempt id>
>>>> > - On the same machine executed jmap -histo <pid>
>>>> >
>>>> > This will give you an idea of the count of objects allocated and
>>>> size. Jmap also has options to get a dump, that will contain more
>>>> information, but this should help to get you started with debugging.
>>>> >
>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>> >
>>>> > Thanks
>>>> > hemanth
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>> read the whole file and load it into list in the mapper.
>>>> >
>>>> >
>>>> > For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>> >
>>>> > —
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> >
>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi Hemanth,
>>>> >
>>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>> >
>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>> it can. Can you just try reading in the data you want to read and see if
>>>> that works ?
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > io.sort.mb = 256 MB
>>>> >
>>>> >
>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>> > The MapTask may consume some memory of its own as well. What is your
>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>> >
>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> > <na...@gmail.com> wrote:
>>>> > > Hi,
>>>> > >
>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>> really read
>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>> > >
>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>> in the
>>>> > > configure method of mapper.
>>>> > >
>>>> > > @Override
>>>> > >
>>>> > > public void configure(JobConf job) {
>>>> > >
>>>> > > System.out.println("FREE MEMORY -- "
>>>> > >
>>>> > > + Runtime.getRuntime().freeMemory());
>>>> > >
>>>> > > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> > >
>>>> > > }
>>>> > >
>>>> > >
>>>> > > Surprisingly the output was
>>>> > >
>>>> > >
>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>> > >
>>>> > >
>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>> of heap
>>>> > > which I configured for the child jvm heap.
>>>> > >
>>>> > >
>>>> > > Appreciate in helping me understand the scenario.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Regards
>>>> > >
>>>> > > Nagarjuna K
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Harsh J
>>>> >
>>>> >
>>>> > --
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

Nice to see this. I didnot know about this till now.

But few one more issue.. the dump file did not get created..   The
following are the logs



ttempt_201302211510_81218_m_000000_0:
/data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ...
attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
bytes in 3.778 secs]
attempt_201302211510_81218_m_000000_0: #
attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
heap space
attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh"
attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
"./dump.sh"...
attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not
exist.
attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
found for logger (org.apache.hadoop.hdfs.DFSClient).





On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Couple of things to check:
>
> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
> interface ? You can look at an example at (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
> That's what accepts the -D params on command line. Alternatively, you can
> also set the same in the configuration object like this, in your launcher
> code:
>
> Configuration conf = new Configuration()
>
> conf.set("mapred.create.symlink", "yes");
>
> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>
> conf.set("mapred.child.java.opts",
>
>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh");
>
>
> Second, the position of the arguments matters. I think the command should
> be
>
> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>
> Thanks
> Hemanth
>
>
> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth/Koji,
>>
>> Seems the above script doesn't work for me.  Can u look into the
>> following and suggest what more can I do
>>
>>
>>  hadoop fs -cat /user/ims-b/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>
>>
>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>  -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>>
>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>
>>
>>
>> Erorr in the mapper :
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> 	... 17 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>> 	at java.util.ArrayList.add(ArrayList.java:351)
>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>> 	... 22 more
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Koji,
>>>
>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>> with your script today !
>>>
>>> Hemanth
>>>
>>>
>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>>
>>>> Create a dump.sh on hdfs.
>>>>
>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof
>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>
>>>> Run your job with
>>>>
>>>> -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>
>>>> Koji
>>>>
>>>>
>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>> like I suspected, the dump goes to the current work directory of the task
>>>> attempt as it executes on the cluster. This directory is cleaned up once
>>>> the task is done. There are options to keep failed task files or task files
>>>> matching a pattern. However, these are NOT retaining the current working
>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>> >
>>>> > You are effectively left with the jmap option on pseudo distributed
>>>> cluster I think.
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > If your task is running out of memory, you could add the option
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>>> not sure  where it stores the dump.. You might need to experiment a little
>>>> on it.. Will try and send out the info if I get time to try out.
>>>> >
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi hemanth,
>>>> >
>>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>>  But the real problem for me is, the cluster is being maintained by third
>>>> party. I only have have a edge node through which I can submit the jobs.
>>>> >
>>>> > Is there any other way of getting the dump instead of physically
>>>> going to that machine and  checking out.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > One option to find what could be taking the memory is to use jmap on
>>>> the running task. The steps I followed are:
>>>> >
>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>> distribution - effectively does nothing in the mapper / reducer).
>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>> > - Then on the machine where the map task is running, got the PID of
>>>> the running task - ps -ef | grep <task attempt id>
>>>> > - On the same machine executed jmap -histo <pid>
>>>> >
>>>> > This will give you an idea of the count of objects allocated and
>>>> size. Jmap also has options to get a dump, that will contain more
>>>> information, but this should help to get you started with debugging.
>>>> >
>>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>>> >
>>>> > Thanks
>>>> > hemanth
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > I have a lookup file which I need in the mapper. So I am trying to
>>>> read the whole file and load it into list in the mapper.
>>>> >
>>>> >
>>>> > For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>> >
>>>> > —
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> >
>>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > Hi Hemanth,
>>>> >
>>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>> >
>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>> > Hi,
>>>> >
>>>> > The free memory might be low, just because GC hasn't reclaimed what
>>>> it can. Can you just try reading in the data you want to read and see if
>>>> that works ?
>>>> >
>>>> > Thanks
>>>> > Hemanth
>>>> >
>>>> >
>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>> > io.sort.mb = 256 MB
>>>> >
>>>> >
>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>> > The MapTask may consume some memory of its own as well. What is your
>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>> >
>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> > <na...@gmail.com> wrote:
>>>> > > Hi,
>>>> > >
>>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>> really read
>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>> > >
>>>> > > I wanted to confirm the same and wrote the following piece of code
>>>> in the
>>>> > > configure method of mapper.
>>>> > >
>>>> > > @Override
>>>> > >
>>>> > > public void configure(JobConf job) {
>>>> > >
>>>> > > System.out.println("FREE MEMORY -- "
>>>> > >
>>>> > > + Runtime.getRuntime().freeMemory());
>>>> > >
>>>> > > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> > >
>>>> > > }
>>>> > >
>>>> > >
>>>> > > Surprisingly the output was
>>>> > >
>>>> > >
>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>> > >
>>>> > >
>>>> > > I am just wondering what processes are taking up that extra 1.6GB
>>>> of heap
>>>> > > which I configured for the child jvm heap.
>>>> > >
>>>> > >
>>>> > > Appreciate in helping me understand the scenario.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Regards
>>>> > >
>>>> > > Nagarjuna K
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Harsh J
>>>> >
>>>> >
>>>> > --
>>>> > Sent from iPhone
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");


Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth


On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> 	... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>> >
>>> > You are effectively left with the jmap option on pseudo distributed
>>> cluster I think.
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > If your task is running out of memory, you could add the option
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>> not sure  where it stores the dump.. You might need to experiment a little
>>> on it.. Will try and send out the info if I get time to try out.
>>> >
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi hemanth,
>>> >
>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>  But the real problem for me is, the cluster is being maintained by third
>>> party. I only have have a edge node through which I can submit the jobs.
>>> >
>>> > Is there any other way of getting the dump instead of physically going
>>> to that machine and  checking out.
>>> >
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > One option to find what could be taking the memory is to use jmap on
>>> the running task. The steps I followed are:
>>> >
>>> > - I ran a sleep job (which comes in the examples jar of the
>>> distribution - effectively does nothing in the mapper / reducer).
>>> > - From the JobTracker UI looked at a map task attempt ID.
>>> > - Then on the machine where the map task is running, got the PID of
>>> the running task - ps -ef | grep <task attempt id>
>>> > - On the same machine executed jmap -histo <pid>
>>> >
>>> > This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>> >
>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>> >
>>> > Thanks
>>> > hemanth
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > I have a lookup file which I need in the mapper. So I am trying to
>>> read the whole file and load it into list in the mapper.
>>> >
>>> >
>>> > For each and every record Iook in this file which I got from
>>> distributed cache.
>>> >
>>> > —
>>> > Sent from iPhone
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> >
>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi Hemanth,
>>> >
>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>> >
>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > io.sort.mb = 256 MB
>>> >
>>> >
>>> > On Monday, March 25, 2013, Harsh J wrote:
>>> > The MapTask may consume some memory of its own as well. What is your
>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>> >
>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> > <na...@gmail.com> wrote:
>>> > > Hi,
>>> > >
>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>> really read
>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>> > >
>>> > > I wanted to confirm the same and wrote the following piece of code
>>> in the
>>> > > configure method of mapper.
>>> > >
>>> > > @Override
>>> > >
>>> > > public void configure(JobConf job) {
>>> > >
>>> > > System.out.println("FREE MEMORY -- "
>>> > >
>>> > > + Runtime.getRuntime().freeMemory());
>>> > >
>>> > > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> > >
>>> > > }
>>> > >
>>> > >
>>> > > Surprisingly the output was
>>> > >
>>> > >
>>> > > FREE MEMORY -- 341854864  = 320 MB
>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>> > >
>>> > >
>>> > > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > > which I configured for the child jvm heap.
>>> > >
>>> > >
>>> > > Appreciate in helping me understand the scenario.
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Nagarjuna K
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> >
>>> >
>>> > --
>>> > Sent from iPhone
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");


Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth


On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> 	... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>> >
>>> > You are effectively left with the jmap option on pseudo distributed
>>> cluster I think.
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > If your task is running out of memory, you could add the option
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>> not sure  where it stores the dump.. You might need to experiment a little
>>> on it.. Will try and send out the info if I get time to try out.
>>> >
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi hemanth,
>>> >
>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>  But the real problem for me is, the cluster is being maintained by third
>>> party. I only have have a edge node through which I can submit the jobs.
>>> >
>>> > Is there any other way of getting the dump instead of physically going
>>> to that machine and  checking out.
>>> >
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > One option to find what could be taking the memory is to use jmap on
>>> the running task. The steps I followed are:
>>> >
>>> > - I ran a sleep job (which comes in the examples jar of the
>>> distribution - effectively does nothing in the mapper / reducer).
>>> > - From the JobTracker UI looked at a map task attempt ID.
>>> > - Then on the machine where the map task is running, got the PID of
>>> the running task - ps -ef | grep <task attempt id>
>>> > - On the same machine executed jmap -histo <pid>
>>> >
>>> > This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>> >
>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>> >
>>> > Thanks
>>> > hemanth
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > I have a lookup file which I need in the mapper. So I am trying to
>>> read the whole file and load it into list in the mapper.
>>> >
>>> >
>>> > For each and every record Iook in this file which I got from
>>> distributed cache.
>>> >
>>> > —
>>> > Sent from iPhone
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> >
>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi Hemanth,
>>> >
>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>> >
>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > io.sort.mb = 256 MB
>>> >
>>> >
>>> > On Monday, March 25, 2013, Harsh J wrote:
>>> > The MapTask may consume some memory of its own as well. What is your
>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>> >
>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> > <na...@gmail.com> wrote:
>>> > > Hi,
>>> > >
>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>> really read
>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>> > >
>>> > > I wanted to confirm the same and wrote the following piece of code
>>> in the
>>> > > configure method of mapper.
>>> > >
>>> > > @Override
>>> > >
>>> > > public void configure(JobConf job) {
>>> > >
>>> > > System.out.println("FREE MEMORY -- "
>>> > >
>>> > > + Runtime.getRuntime().freeMemory());
>>> > >
>>> > > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> > >
>>> > > }
>>> > >
>>> > >
>>> > > Surprisingly the output was
>>> > >
>>> > >
>>> > > FREE MEMORY -- 341854864  = 320 MB
>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>> > >
>>> > >
>>> > > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > > which I configured for the child jvm heap.
>>> > >
>>> > >
>>> > > Appreciate in helping me understand the scenario.
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Nagarjuna K
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> >
>>> >
>>> > --
>>> > Sent from iPhone
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");


Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth


On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> 	... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>> >
>>> > You are effectively left with the jmap option on pseudo distributed
>>> cluster I think.
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > If your task is running out of memory, you could add the option
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>> not sure  where it stores the dump.. You might need to experiment a little
>>> on it.. Will try and send out the info if I get time to try out.
>>> >
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi hemanth,
>>> >
>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>  But the real problem for me is, the cluster is being maintained by third
>>> party. I only have have a edge node through which I can submit the jobs.
>>> >
>>> > Is there any other way of getting the dump instead of physically going
>>> to that machine and  checking out.
>>> >
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > One option to find what could be taking the memory is to use jmap on
>>> the running task. The steps I followed are:
>>> >
>>> > - I ran a sleep job (which comes in the examples jar of the
>>> distribution - effectively does nothing in the mapper / reducer).
>>> > - From the JobTracker UI looked at a map task attempt ID.
>>> > - Then on the machine where the map task is running, got the PID of
>>> the running task - ps -ef | grep <task attempt id>
>>> > - On the same machine executed jmap -histo <pid>
>>> >
>>> > This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>> >
>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>> >
>>> > Thanks
>>> > hemanth
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > I have a lookup file which I need in the mapper. So I am trying to
>>> read the whole file and load it into list in the mapper.
>>> >
>>> >
>>> > For each and every record Iook in this file which I got from
>>> distributed cache.
>>> >
>>> > —
>>> > Sent from iPhone
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> >
>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi Hemanth,
>>> >
>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>> >
>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > io.sort.mb = 256 MB
>>> >
>>> >
>>> > On Monday, March 25, 2013, Harsh J wrote:
>>> > The MapTask may consume some memory of its own as well. What is your
>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>> >
>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> > <na...@gmail.com> wrote:
>>> > > Hi,
>>> > >
>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>> really read
>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>> > >
>>> > > I wanted to confirm the same and wrote the following piece of code
>>> in the
>>> > > configure method of mapper.
>>> > >
>>> > > @Override
>>> > >
>>> > > public void configure(JobConf job) {
>>> > >
>>> > > System.out.println("FREE MEMORY -- "
>>> > >
>>> > > + Runtime.getRuntime().freeMemory());
>>> > >
>>> > > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> > >
>>> > > }
>>> > >
>>> > >
>>> > > Surprisingly the output was
>>> > >
>>> > >
>>> > > FREE MEMORY -- 341854864  = 320 MB
>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>> > >
>>> > >
>>> > > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > > which I configured for the child jvm heap.
>>> > >
>>> > >
>>> > > Appreciate in helping me understand the scenario.
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Nagarjuna K
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> >
>>> >
>>> > --
>>> > Sent from iPhone
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");


Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth


On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> 	... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>> >
>>> > You are effectively left with the jmap option on pseudo distributed
>>> cluster I think.
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > If your task is running out of memory, you could add the option
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>> not sure  where it stores the dump.. You might need to experiment a little
>>> on it.. Will try and send out the info if I get time to try out.
>>> >
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi hemanth,
>>> >
>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>  But the real problem for me is, the cluster is being maintained by third
>>> party. I only have have a edge node through which I can submit the jobs.
>>> >
>>> > Is there any other way of getting the dump instead of physically going
>>> to that machine and  checking out.
>>> >
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > One option to find what could be taking the memory is to use jmap on
>>> the running task. The steps I followed are:
>>> >
>>> > - I ran a sleep job (which comes in the examples jar of the
>>> distribution - effectively does nothing in the mapper / reducer).
>>> > - From the JobTracker UI looked at a map task attempt ID.
>>> > - Then on the machine where the map task is running, got the PID of
>>> the running task - ps -ef | grep <task attempt id>
>>> > - On the same machine executed jmap -histo <pid>
>>> >
>>> > This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>> >
>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>> >
>>> > Thanks
>>> > hemanth
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > I have a lookup file which I need in the mapper. So I am trying to
>>> read the whole file and load it into list in the mapper.
>>> >
>>> >
>>> > For each and every record Iook in this file which I got from
>>> distributed cache.
>>> >
>>> > —
>>> > Sent from iPhone
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> >
>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi Hemanth,
>>> >
>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>> >
>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > io.sort.mb = 256 MB
>>> >
>>> >
>>> > On Monday, March 25, 2013, Harsh J wrote:
>>> > The MapTask may consume some memory of its own as well. What is your
>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>> >
>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> > <na...@gmail.com> wrote:
>>> > > Hi,
>>> > >
>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>> really read
>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>> > >
>>> > > I wanted to confirm the same and wrote the following piece of code
>>> in the
>>> > > configure method of mapper.
>>> > >
>>> > > @Override
>>> > >
>>> > > public void configure(JobConf job) {
>>> > >
>>> > > System.out.println("FREE MEMORY -- "
>>> > >
>>> > > + Runtime.getRuntime().freeMemory());
>>> > >
>>> > > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> > >
>>> > > }
>>> > >
>>> > >
>>> > > Surprisingly the output was
>>> > >
>>> > >
>>> > > FREE MEMORY -- 341854864  = 320 MB
>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>> > >
>>> > >
>>> > > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > > which I configured for the child jvm heap.
>>> > >
>>> > >
>>> > > Appreciate in helping me understand the scenario.
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Nagarjuna K
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> >
>>> >
>>> > --
>>> > Sent from iPhone
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth/Koji,

Seems the above script doesn't work for me.  Can u look into the following
and suggest what more can I do


 hadoop fs -cat /user/ims-b/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof


hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
 -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'


I am not able to see the heap dump at  /tmp/myheapdump_ims



Erorr in the mapper :

Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
	at java.util.ArrayList.add(ArrayList.java:351)
	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
	... 22 more





On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Koji,
>
> Works beautifully. Thanks a lot. I learnt at least 3 different things with
> your script today !
>
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>
>> Create a dump.sh on hdfs.
>>
>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof
>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>
>> Run your job with
>>
>> -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>
>> Koji
>>
>>
>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>
>> > Hi,
>> >
>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
>> I suspected, the dump goes to the current work directory of the task
>> attempt as it executes on the cluster. This directory is cleaned up once
>> the task is done. There are options to keep failed task files or task files
>> matching a pattern. However, these are NOT retaining the current working
>> directory. Hence, there is no option to get this from a cluster AFAIK.
>> >
>> > You are effectively left with the jmap option on pseudo distributed
>> cluster I think.
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > If your task is running out of memory, you could add the option
>> -XX:+HeapDumpOnOutOfMemoryError
>> > to mapred.child.java.opts (along with the heap memory). However, I am
>> not sure  where it stores the dump.. You might need to experiment a little
>> on it.. Will try and send out the info if I get time to try out.
>> >
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi hemanth,
>> >
>> > This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>> >
>> > Is there any other way of getting the dump instead of physically going
>> to that machine and  checking out.
>> >
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > One option to find what could be taking the memory is to use jmap on
>> the running task. The steps I followed are:
>> >
>> > - I ran a sleep job (which comes in the examples jar of the
>> distribution - effectively does nothing in the mapper / reducer).
>> > - From the JobTracker UI looked at a map task attempt ID.
>> > - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> > - On the same machine executed jmap -histo <pid>
>> >
>> > This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>> >
>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>> >
>> > Thanks
>> > hemanth
>> >
>> >
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>> >
>> >
>> > For each and every record Iook in this file which I got from
>> distributed cache.
>> >
>> > —
>> > Sent from iPhone
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> >
>> > Hmm. How are you loading the file into memory ? Is it some sort of
>> memory mapping etc ? Are they being read as records ? Some details of the
>> app will help
>> >
>> >
>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi Hemanth,
>> >
>> > I tried out your suggestion loading 420 MB file into memory. It threw
>> java heap space error.
>> >
>> > I am not sure where this 1.6 GB of configured heap went to ?
>> >
>> >
>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > io.sort.mb = 256 MB
>> >
>> >
>> > On Monday, March 25, 2013, Harsh J wrote:
>> > The MapTask may consume some memory of its own as well. What is your
>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>> >
>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> > <na...@gmail.com> wrote:
>> > > Hi,
>> > >
>> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > > 1.5GB of data and store it in memory (mapper/reducer).
>> > >
>> > > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > > configure method of mapper.
>> > >
>> > > @Override
>> > >
>> > > public void configure(JobConf job) {
>> > >
>> > > System.out.println("FREE MEMORY -- "
>> > >
>> > > + Runtime.getRuntime().freeMemory());
>> > >
>> > > System.out.println("MAX MEMORY ---" +
>> Runtime.getRuntime().maxMemory());
>> > >
>> > > }
>> > >
>> > >
>> > > Surprisingly the output was
>> > >
>> > >
>> > > FREE MEMORY -- 341854864  = 320 MB
>> > > MAX MEMORY ---1908932608  = 1.9 GB
>> > >
>> > >
>> > > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > > which I configured for the child jvm heap.
>> > >
>> > >
>> > > Appreciate in helping me understand the scenario.
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > Nagarjuna K
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harsh J
>> >
>> >
>> > --
>> > Sent from iPhone
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth/Koji,

Seems the above script doesn't work for me.  Can u look into the following
and suggest what more can I do


 hadoop fs -cat /user/ims-b/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof


hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
 -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'


I am not able to see the heap dump at  /tmp/myheapdump_ims



Erorr in the mapper :

Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
	at java.util.ArrayList.add(ArrayList.java:351)
	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
	... 22 more





On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Koji,
>
> Works beautifully. Thanks a lot. I learnt at least 3 different things with
> your script today !
>
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>
>> Create a dump.sh on hdfs.
>>
>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof
>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>
>> Run your job with
>>
>> -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>
>> Koji
>>
>>
>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>
>> > Hi,
>> >
>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
>> I suspected, the dump goes to the current work directory of the task
>> attempt as it executes on the cluster. This directory is cleaned up once
>> the task is done. There are options to keep failed task files or task files
>> matching a pattern. However, these are NOT retaining the current working
>> directory. Hence, there is no option to get this from a cluster AFAIK.
>> >
>> > You are effectively left with the jmap option on pseudo distributed
>> cluster I think.
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > If your task is running out of memory, you could add the option
>> -XX:+HeapDumpOnOutOfMemoryError
>> > to mapred.child.java.opts (along with the heap memory). However, I am
>> not sure  where it stores the dump.. You might need to experiment a little
>> on it.. Will try and send out the info if I get time to try out.
>> >
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi hemanth,
>> >
>> > This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>> >
>> > Is there any other way of getting the dump instead of physically going
>> to that machine and  checking out.
>> >
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > One option to find what could be taking the memory is to use jmap on
>> the running task. The steps I followed are:
>> >
>> > - I ran a sleep job (which comes in the examples jar of the
>> distribution - effectively does nothing in the mapper / reducer).
>> > - From the JobTracker UI looked at a map task attempt ID.
>> > - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> > - On the same machine executed jmap -histo <pid>
>> >
>> > This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>> >
>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>> >
>> > Thanks
>> > hemanth
>> >
>> >
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>> >
>> >
>> > For each and every record Iook in this file which I got from
>> distributed cache.
>> >
>> > —
>> > Sent from iPhone
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> >
>> > Hmm. How are you loading the file into memory ? Is it some sort of
>> memory mapping etc ? Are they being read as records ? Some details of the
>> app will help
>> >
>> >
>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi Hemanth,
>> >
>> > I tried out your suggestion loading 420 MB file into memory. It threw
>> java heap space error.
>> >
>> > I am not sure where this 1.6 GB of configured heap went to ?
>> >
>> >
>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > io.sort.mb = 256 MB
>> >
>> >
>> > On Monday, March 25, 2013, Harsh J wrote:
>> > The MapTask may consume some memory of its own as well. What is your
>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>> >
>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> > <na...@gmail.com> wrote:
>> > > Hi,
>> > >
>> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > > 1.5GB of data and store it in memory (mapper/reducer).
>> > >
>> > > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > > configure method of mapper.
>> > >
>> > > @Override
>> > >
>> > > public void configure(JobConf job) {
>> > >
>> > > System.out.println("FREE MEMORY -- "
>> > >
>> > > + Runtime.getRuntime().freeMemory());
>> > >
>> > > System.out.println("MAX MEMORY ---" +
>> Runtime.getRuntime().maxMemory());
>> > >
>> > > }
>> > >
>> > >
>> > > Surprisingly the output was
>> > >
>> > >
>> > > FREE MEMORY -- 341854864  = 320 MB
>> > > MAX MEMORY ---1908932608  = 1.9 GB
>> > >
>> > >
>> > > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > > which I configured for the child jvm heap.
>> > >
>> > >
>> > > Appreciate in helping me understand the scenario.
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > Nagarjuna K
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harsh J
>> >
>> >
>> > --
>> > Sent from iPhone
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth/Koji,

Seems the above script doesn't work for me.  Can u look into the following
and suggest what more can I do


 hadoop fs -cat /user/ims-b/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof


hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
 -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'


I am not able to see the heap dump at  /tmp/myheapdump_ims



Erorr in the mapper :

Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
	at java.util.ArrayList.add(ArrayList.java:351)
	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
	... 22 more





On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Koji,
>
> Works beautifully. Thanks a lot. I learnt at least 3 different things with
> your script today !
>
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>
>> Create a dump.sh on hdfs.
>>
>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof
>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>
>> Run your job with
>>
>> -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>
>> Koji
>>
>>
>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>
>> > Hi,
>> >
>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
>> I suspected, the dump goes to the current work directory of the task
>> attempt as it executes on the cluster. This directory is cleaned up once
>> the task is done. There are options to keep failed task files or task files
>> matching a pattern. However, these are NOT retaining the current working
>> directory. Hence, there is no option to get this from a cluster AFAIK.
>> >
>> > You are effectively left with the jmap option on pseudo distributed
>> cluster I think.
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > If your task is running out of memory, you could add the option
>> -XX:+HeapDumpOnOutOfMemoryError
>> > to mapred.child.java.opts (along with the heap memory). However, I am
>> not sure  where it stores the dump.. You might need to experiment a little
>> on it.. Will try and send out the info if I get time to try out.
>> >
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi hemanth,
>> >
>> > This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>> >
>> > Is there any other way of getting the dump instead of physically going
>> to that machine and  checking out.
>> >
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > One option to find what could be taking the memory is to use jmap on
>> the running task. The steps I followed are:
>> >
>> > - I ran a sleep job (which comes in the examples jar of the
>> distribution - effectively does nothing in the mapper / reducer).
>> > - From the JobTracker UI looked at a map task attempt ID.
>> > - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> > - On the same machine executed jmap -histo <pid>
>> >
>> > This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>> >
>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>> >
>> > Thanks
>> > hemanth
>> >
>> >
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>> >
>> >
>> > For each and every record Iook in this file which I got from
>> distributed cache.
>> >
>> > —
>> > Sent from iPhone
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> >
>> > Hmm. How are you loading the file into memory ? Is it some sort of
>> memory mapping etc ? Are they being read as records ? Some details of the
>> app will help
>> >
>> >
>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi Hemanth,
>> >
>> > I tried out your suggestion loading 420 MB file into memory. It threw
>> java heap space error.
>> >
>> > I am not sure where this 1.6 GB of configured heap went to ?
>> >
>> >
>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > io.sort.mb = 256 MB
>> >
>> >
>> > On Monday, March 25, 2013, Harsh J wrote:
>> > The MapTask may consume some memory of its own as well. What is your
>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>> >
>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> > <na...@gmail.com> wrote:
>> > > Hi,
>> > >
>> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > > 1.5GB of data and store it in memory (mapper/reducer).
>> > >
>> > > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > > configure method of mapper.
>> > >
>> > > @Override
>> > >
>> > > public void configure(JobConf job) {
>> > >
>> > > System.out.println("FREE MEMORY -- "
>> > >
>> > > + Runtime.getRuntime().freeMemory());
>> > >
>> > > System.out.println("MAX MEMORY ---" +
>> Runtime.getRuntime().maxMemory());
>> > >
>> > > }
>> > >
>> > >
>> > > Surprisingly the output was
>> > >
>> > >
>> > > FREE MEMORY -- 341854864  = 320 MB
>> > > MAX MEMORY ---1908932608  = 1.9 GB
>> > >
>> > >
>> > > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > > which I configured for the child jvm heap.
>> > >
>> > >
>> > > Appreciate in helping me understand the scenario.
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > Nagarjuna K
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harsh J
>> >
>> >
>> > --
>> > Sent from iPhone
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth/Koji,

Seems the above script doesn't work for me.  Can u look into the following
and suggest what more can I do


 hadoop fs -cat /user/ims-b/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof


hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
 -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'


I am not able to see the heap dump at  /tmp/myheapdump_ims



Erorr in the mapper :

Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
	at java.util.ArrayList.add(ArrayList.java:351)
	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
	... 22 more





On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Koji,
>
> Works beautifully. Thanks a lot. I learnt at least 3 different things with
> your script today !
>
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>
>> Create a dump.sh on hdfs.
>>
>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>> #!/bin/sh
>> hadoop dfs -put myheapdump.hprof
>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>
>> Run your job with
>>
>> -Dmapred.create.symlink=yes
>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>
>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>
>> Koji
>>
>>
>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>
>> > Hi,
>> >
>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
>> I suspected, the dump goes to the current work directory of the task
>> attempt as it executes on the cluster. This directory is cleaned up once
>> the task is done. There are options to keep failed task files or task files
>> matching a pattern. However, these are NOT retaining the current working
>> directory. Hence, there is no option to get this from a cluster AFAIK.
>> >
>> > You are effectively left with the jmap option on pseudo distributed
>> cluster I think.
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > If your task is running out of memory, you could add the option
>> -XX:+HeapDumpOnOutOfMemoryError
>> > to mapred.child.java.opts (along with the heap memory). However, I am
>> not sure  where it stores the dump.. You might need to experiment a little
>> on it.. Will try and send out the info if I get time to try out.
>> >
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi hemanth,
>> >
>> > This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>> >
>> > Is there any other way of getting the dump instead of physically going
>> to that machine and  checking out.
>> >
>> >
>> >
>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > One option to find what could be taking the memory is to use jmap on
>> the running task. The steps I followed are:
>> >
>> > - I ran a sleep job (which comes in the examples jar of the
>> distribution - effectively does nothing in the mapper / reducer).
>> > - From the JobTracker UI looked at a map task attempt ID.
>> > - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> > - On the same machine executed jmap -histo <pid>
>> >
>> > This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>> >
>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>> >
>> > Thanks
>> > hemanth
>> >
>> >
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>> >
>> >
>> > For each and every record Iook in this file which I got from
>> distributed cache.
>> >
>> > —
>> > Sent from iPhone
>> >
>> >
>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> >
>> > Hmm. How are you loading the file into memory ? Is it some sort of
>> memory mapping etc ? Are they being read as records ? Some details of the
>> app will help
>> >
>> >
>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > Hi Hemanth,
>> >
>> > I tried out your suggestion loading 420 MB file into memory. It threw
>> java heap space error.
>> >
>> > I am not sure where this 1.6 GB of configured heap went to ?
>> >
>> >
>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>> > Hi,
>> >
>> > The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> > io.sort.mb = 256 MB
>> >
>> >
>> > On Monday, March 25, 2013, Harsh J wrote:
>> > The MapTask may consume some memory of its own as well. What is your
>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>> >
>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> > <na...@gmail.com> wrote:
>> > > Hi,
>> > >
>> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > > 1.5GB of data and store it in memory (mapper/reducer).
>> > >
>> > > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > > configure method of mapper.
>> > >
>> > > @Override
>> > >
>> > > public void configure(JobConf job) {
>> > >
>> > > System.out.println("FREE MEMORY -- "
>> > >
>> > > + Runtime.getRuntime().freeMemory());
>> > >
>> > > System.out.println("MAX MEMORY ---" +
>> Runtime.getRuntime().maxMemory());
>> > >
>> > > }
>> > >
>> > >
>> > > Surprisingly the output was
>> > >
>> > >
>> > > FREE MEMORY -- 341854864  = 320 MB
>> > > MAX MEMORY ---1908932608  = 1.9 GB
>> > >
>> > >
>> > > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > > which I configured for the child jvm heap.
>> > >
>> > >
>> > > Appreciate in helping me understand the scenario.
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > Nagarjuna K
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harsh J
>> >
>> >
>> > --
>> > Sent from iPhone
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Koji,

Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !

Hemanth


On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> Create a dump.sh on hdfs.
>
> $ hadoop dfs -cat /user/knoguchi/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof
> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>
> Run your job with
>
> -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>
> Koji
>
>
> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>
> > Hi,
> >
> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
> I suspected, the dump goes to the current work directory of the task
> attempt as it executes on the cluster. This directory is cleaned up once
> the task is done. There are options to keep failed task files or task files
> matching a pattern. However, these are NOT retaining the current working
> directory. Hence, there is no option to get this from a cluster AFAIK.
> >
> > You are effectively left with the jmap option on pseudo distributed
> cluster I think.
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > If your task is running out of memory, you could add the option
> -XX:+HeapDumpOnOutOfMemoryError
> > to mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
> >
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi hemanth,
> >
> > This sounds interesting, will out try out that on the pseudo cluster.
>  But the real problem for me is, the cluster is being maintained by third
> party. I only have have a edge node through which I can submit the jobs.
> >
> > Is there any other way of getting the dump instead of physically going
> to that machine and  checking out.
> >
> >
> >
> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
> >
> > - I ran a sleep job (which comes in the examples jar of the distribution
> - effectively does nothing in the mapper / reducer).
> > - From the JobTracker UI looked at a map task attempt ID.
> > - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> > - On the same machine executed jmap -histo <pid>
> >
> > This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
> >
> > For my sleep job task - I saw allocations worth roughly 130 MB.
> >
> > Thanks
> > hemanth
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
> >
> >
> > For each and every record Iook in this file which I got from distributed
> cache.
> >
> > —
> > Sent from iPhone
> >
> >
> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> >
> > Hmm. How are you loading the file into memory ? Is it some sort of
> memory mapping etc ? Are they being read as records ? Some details of the
> app will help
> >
> >
> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi Hemanth,
> >
> > I tried out your suggestion loading 420 MB file into memory. It threw
> java heap space error.
> >
> > I am not sure where this 1.6 GB of configured heap went to ?
> >
> >
> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > io.sort.mb = 256 MB
> >
> >
> > On Monday, March 25, 2013, Harsh J wrote:
> > The MapTask may consume some memory of its own as well. What is your
> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> >
> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> > <na...@gmail.com> wrote:
> > > Hi,
> > >
> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > > 1.5GB of data and store it in memory (mapper/reducer).
> > >
> > > I wanted to confirm the same and wrote the following piece of code in
> the
> > > configure method of mapper.
> > >
> > > @Override
> > >
> > > public void configure(JobConf job) {
> > >
> > > System.out.println("FREE MEMORY -- "
> > >
> > > + Runtime.getRuntime().freeMemory());
> > >
> > > System.out.println("MAX MEMORY ---" +
> Runtime.getRuntime().maxMemory());
> > >
> > > }
> > >
> > >
> > > Surprisingly the output was
> > >
> > >
> > > FREE MEMORY -- 341854864  = 320 MB
> > > MAX MEMORY ---1908932608  = 1.9 GB
> > >
> > >
> > > I am just wondering what processes are taking up that extra 1.6GB of
> heap
> > > which I configured for the child jvm heap.
> > >
> > >
> > > Appreciate in helping me understand the scenario.
> > >
> > >
> > >
> > > Regards
> > >
> > > Nagarjuna K
> > >
> > >
> > >
> >
> >
> >
> > --
> > Harsh J
> >
> >
> > --
> > Sent from iPhone
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Koji,

Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !

Hemanth


On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> Create a dump.sh on hdfs.
>
> $ hadoop dfs -cat /user/knoguchi/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof
> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>
> Run your job with
>
> -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>
> Koji
>
>
> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>
> > Hi,
> >
> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
> I suspected, the dump goes to the current work directory of the task
> attempt as it executes on the cluster. This directory is cleaned up once
> the task is done. There are options to keep failed task files or task files
> matching a pattern. However, these are NOT retaining the current working
> directory. Hence, there is no option to get this from a cluster AFAIK.
> >
> > You are effectively left with the jmap option on pseudo distributed
> cluster I think.
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > If your task is running out of memory, you could add the option
> -XX:+HeapDumpOnOutOfMemoryError
> > to mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
> >
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi hemanth,
> >
> > This sounds interesting, will out try out that on the pseudo cluster.
>  But the real problem for me is, the cluster is being maintained by third
> party. I only have have a edge node through which I can submit the jobs.
> >
> > Is there any other way of getting the dump instead of physically going
> to that machine and  checking out.
> >
> >
> >
> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
> >
> > - I ran a sleep job (which comes in the examples jar of the distribution
> - effectively does nothing in the mapper / reducer).
> > - From the JobTracker UI looked at a map task attempt ID.
> > - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> > - On the same machine executed jmap -histo <pid>
> >
> > This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
> >
> > For my sleep job task - I saw allocations worth roughly 130 MB.
> >
> > Thanks
> > hemanth
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
> >
> >
> > For each and every record Iook in this file which I got from distributed
> cache.
> >
> > —
> > Sent from iPhone
> >
> >
> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> >
> > Hmm. How are you loading the file into memory ? Is it some sort of
> memory mapping etc ? Are they being read as records ? Some details of the
> app will help
> >
> >
> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi Hemanth,
> >
> > I tried out your suggestion loading 420 MB file into memory. It threw
> java heap space error.
> >
> > I am not sure where this 1.6 GB of configured heap went to ?
> >
> >
> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > io.sort.mb = 256 MB
> >
> >
> > On Monday, March 25, 2013, Harsh J wrote:
> > The MapTask may consume some memory of its own as well. What is your
> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> >
> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> > <na...@gmail.com> wrote:
> > > Hi,
> > >
> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > > 1.5GB of data and store it in memory (mapper/reducer).
> > >
> > > I wanted to confirm the same and wrote the following piece of code in
> the
> > > configure method of mapper.
> > >
> > > @Override
> > >
> > > public void configure(JobConf job) {
> > >
> > > System.out.println("FREE MEMORY -- "
> > >
> > > + Runtime.getRuntime().freeMemory());
> > >
> > > System.out.println("MAX MEMORY ---" +
> Runtime.getRuntime().maxMemory());
> > >
> > > }
> > >
> > >
> > > Surprisingly the output was
> > >
> > >
> > > FREE MEMORY -- 341854864  = 320 MB
> > > MAX MEMORY ---1908932608  = 1.9 GB
> > >
> > >
> > > I am just wondering what processes are taking up that extra 1.6GB of
> heap
> > > which I configured for the child jvm heap.
> > >
> > >
> > > Appreciate in helping me understand the scenario.
> > >
> > >
> > >
> > > Regards
> > >
> > > Nagarjuna K
> > >
> > >
> > >
> >
> >
> >
> > --
> > Harsh J
> >
> >
> > --
> > Sent from iPhone
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Koji,

Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !

Hemanth


On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> Create a dump.sh on hdfs.
>
> $ hadoop dfs -cat /user/knoguchi/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof
> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>
> Run your job with
>
> -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>
> Koji
>
>
> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>
> > Hi,
> >
> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
> I suspected, the dump goes to the current work directory of the task
> attempt as it executes on the cluster. This directory is cleaned up once
> the task is done. There are options to keep failed task files or task files
> matching a pattern. However, these are NOT retaining the current working
> directory. Hence, there is no option to get this from a cluster AFAIK.
> >
> > You are effectively left with the jmap option on pseudo distributed
> cluster I think.
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > If your task is running out of memory, you could add the option
> -XX:+HeapDumpOnOutOfMemoryError
> > to mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
> >
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi hemanth,
> >
> > This sounds interesting, will out try out that on the pseudo cluster.
>  But the real problem for me is, the cluster is being maintained by third
> party. I only have have a edge node through which I can submit the jobs.
> >
> > Is there any other way of getting the dump instead of physically going
> to that machine and  checking out.
> >
> >
> >
> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
> >
> > - I ran a sleep job (which comes in the examples jar of the distribution
> - effectively does nothing in the mapper / reducer).
> > - From the JobTracker UI looked at a map task attempt ID.
> > - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> > - On the same machine executed jmap -histo <pid>
> >
> > This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
> >
> > For my sleep job task - I saw allocations worth roughly 130 MB.
> >
> > Thanks
> > hemanth
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
> >
> >
> > For each and every record Iook in this file which I got from distributed
> cache.
> >
> > —
> > Sent from iPhone
> >
> >
> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> >
> > Hmm. How are you loading the file into memory ? Is it some sort of
> memory mapping etc ? Are they being read as records ? Some details of the
> app will help
> >
> >
> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi Hemanth,
> >
> > I tried out your suggestion loading 420 MB file into memory. It threw
> java heap space error.
> >
> > I am not sure where this 1.6 GB of configured heap went to ?
> >
> >
> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > io.sort.mb = 256 MB
> >
> >
> > On Monday, March 25, 2013, Harsh J wrote:
> > The MapTask may consume some memory of its own as well. What is your
> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> >
> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> > <na...@gmail.com> wrote:
> > > Hi,
> > >
> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > > 1.5GB of data and store it in memory (mapper/reducer).
> > >
> > > I wanted to confirm the same and wrote the following piece of code in
> the
> > > configure method of mapper.
> > >
> > > @Override
> > >
> > > public void configure(JobConf job) {
> > >
> > > System.out.println("FREE MEMORY -- "
> > >
> > > + Runtime.getRuntime().freeMemory());
> > >
> > > System.out.println("MAX MEMORY ---" +
> Runtime.getRuntime().maxMemory());
> > >
> > > }
> > >
> > >
> > > Surprisingly the output was
> > >
> > >
> > > FREE MEMORY -- 341854864  = 320 MB
> > > MAX MEMORY ---1908932608  = 1.9 GB
> > >
> > >
> > > I am just wondering what processes are taking up that extra 1.6GB of
> heap
> > > which I configured for the child jvm heap.
> > >
> > >
> > > Appreciate in helping me understand the scenario.
> > >
> > >
> > >
> > > Regards
> > >
> > > Nagarjuna K
> > >
> > >
> > >
> >
> >
> >
> > --
> > Harsh J
> >
> >
> > --
> > Sent from iPhone
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Koji,

Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !

Hemanth


On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> Create a dump.sh on hdfs.
>
> $ hadoop dfs -cat /user/knoguchi/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof
> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>
> Run your job with
>
> -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>
> Koji
>
>
> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>
> > Hi,
> >
> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
> I suspected, the dump goes to the current work directory of the task
> attempt as it executes on the cluster. This directory is cleaned up once
> the task is done. There are options to keep failed task files or task files
> matching a pattern. However, these are NOT retaining the current working
> directory. Hence, there is no option to get this from a cluster AFAIK.
> >
> > You are effectively left with the jmap option on pseudo distributed
> cluster I think.
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > If your task is running out of memory, you could add the option
> -XX:+HeapDumpOnOutOfMemoryError
> > to mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
> >
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi hemanth,
> >
> > This sounds interesting, will out try out that on the pseudo cluster.
>  But the real problem for me is, the cluster is being maintained by third
> party. I only have have a edge node through which I can submit the jobs.
> >
> > Is there any other way of getting the dump instead of physically going
> to that machine and  checking out.
> >
> >
> >
> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
> >
> > - I ran a sleep job (which comes in the examples jar of the distribution
> - effectively does nothing in the mapper / reducer).
> > - From the JobTracker UI looked at a map task attempt ID.
> > - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> > - On the same machine executed jmap -histo <pid>
> >
> > This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
> >
> > For my sleep job task - I saw allocations worth roughly 130 MB.
> >
> > Thanks
> > hemanth
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
> >
> >
> > For each and every record Iook in this file which I got from distributed
> cache.
> >
> > —
> > Sent from iPhone
> >
> >
> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> >
> > Hmm. How are you loading the file into memory ? Is it some sort of
> memory mapping etc ? Are they being read as records ? Some details of the
> app will help
> >
> >
> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi Hemanth,
> >
> > I tried out your suggestion loading 420 MB file into memory. It threw
> java heap space error.
> >
> > I am not sure where this 1.6 GB of configured heap went to ?
> >
> >
> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
> > Hi,
> >
> > The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
> > io.sort.mb = 256 MB
> >
> >
> > On Monday, March 25, 2013, Harsh J wrote:
> > The MapTask may consume some memory of its own as well. What is your
> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> >
> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> > <na...@gmail.com> wrote:
> > > Hi,
> > >
> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > > 1.5GB of data and store it in memory (mapper/reducer).
> > >
> > > I wanted to confirm the same and wrote the following piece of code in
> the
> > > configure method of mapper.
> > >
> > > @Override
> > >
> > > public void configure(JobConf job) {
> > >
> > > System.out.println("FREE MEMORY -- "
> > >
> > > + Runtime.getRuntime().freeMemory());
> > >
> > > System.out.println("MAX MEMORY ---" +
> Runtime.getRuntime().maxMemory());
> > >
> > > }
> > >
> > >
> > > Surprisingly the output was
> > >
> > >
> > > FREE MEMORY -- 341854864  = 320 MB
> > > MAX MEMORY ---1908932608  = 1.9 GB
> > >
> > >
> > > I am just wondering what processes are taking up that extra 1.6GB of
> heap
> > > which I configured for the child jvm heap.
> > >
> > >
> > > Appreciate in helping me understand the scenario.
> > >
> > >
> > >
> > > Regards
> > >
> > > Nagarjuna K
> > >
> > >
> > >
> >
> >
> >
> > --
> > Harsh J
> >
> >
> > --
> > Sent from iPhone
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Child JVM memory allocation / Usage

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Create a dump.sh on hdfs.

$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof

Run your job with 

-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'

This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.

Koji


On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:

> Hi,
> 
> I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected, the dump goes to the current work directory of the task attempt as it executes on the cluster. This directory is cleaned up once the task is done. There are options to keep failed task files or task files matching a pattern. However, these are NOT retaining the current working directory. Hence, there is no option to get this from a cluster AFAIK.
> 
> You are effectively left with the jmap option on pseudo distributed cluster I think.
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> If your task is running out of memory, you could add the option -XX:+HeapDumpOnOutOfMemoryError 
> to mapred.child.java.opts (along with the heap memory). However, I am not sure  where it stores the dump.. You might need to experiment a little on it.. Will try and send out the info if I get time to try out.
> 
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi hemanth,
> 
> This sounds interesting, will out try out that on the pseudo cluster.  But the real problem for me is, the cluster is being maintained by third party. I only have have a edge node through which I can submit the jobs. 
> 
> Is there any other way of getting the dump instead of physically going to that machine and  checking out. 
> 
> 
> 
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> One option to find what could be taking the memory is to use jmap on the running task. The steps I followed are:
> 
> - I ran a sleep job (which comes in the examples jar of the distribution - effectively does nothing in the mapper / reducer). 
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
> 
> This will give you an idea of the count of objects allocated and size. Jmap also has options to get a dump, that will contain more information, but this should help to get you started with debugging.
> 
> For my sleep job task - I saw allocations worth roughly 130 MB.
> 
> Thanks
> hemanth
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <na...@gmail.com> wrote:
> I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 
> 
> 
> For each and every record Iook in this file which I got from distributed cache. 
> 
> —
> Sent from iPhone
> 
> 
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> 
> Hmm. How are you loading the file into memory ? Is it some sort of memory mapping etc ? Are they being read as records ? Some details of the app will help
> 
> 
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Hemanth,
> 
> I tried out your suggestion loading 420 MB file into memory. It threw java heap space error.
> 
> I am not sure where this 1.6 GB of configured heap went to ?
> 
> 
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> The free memory might be low, just because GC hasn't reclaimed what it can. Can you just try reading in the data you want to read and see if that works ?
> 
> Thanks
> Hemanth
> 
> 
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> io.sort.mb = 256 MB
> 
> 
> On Monday, March 25, 2013, Harsh J wrote:
> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> 
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <na...@gmail.com> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 
> -- 
> Sent from iPhone
> 
> 
> 
> 
> 
> 
> 
> 


Re: Child JVM memory allocation / Usage

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Create a dump.sh on hdfs.

$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof

Run your job with 

-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'

This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.

Koji


On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:

> Hi,
> 
> I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected, the dump goes to the current work directory of the task attempt as it executes on the cluster. This directory is cleaned up once the task is done. There are options to keep failed task files or task files matching a pattern. However, these are NOT retaining the current working directory. Hence, there is no option to get this from a cluster AFAIK.
> 
> You are effectively left with the jmap option on pseudo distributed cluster I think.
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> If your task is running out of memory, you could add the option -XX:+HeapDumpOnOutOfMemoryError 
> to mapred.child.java.opts (along with the heap memory). However, I am not sure  where it stores the dump.. You might need to experiment a little on it.. Will try and send out the info if I get time to try out.
> 
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi hemanth,
> 
> This sounds interesting, will out try out that on the pseudo cluster.  But the real problem for me is, the cluster is being maintained by third party. I only have have a edge node through which I can submit the jobs. 
> 
> Is there any other way of getting the dump instead of physically going to that machine and  checking out. 
> 
> 
> 
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> One option to find what could be taking the memory is to use jmap on the running task. The steps I followed are:
> 
> - I ran a sleep job (which comes in the examples jar of the distribution - effectively does nothing in the mapper / reducer). 
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
> 
> This will give you an idea of the count of objects allocated and size. Jmap also has options to get a dump, that will contain more information, but this should help to get you started with debugging.
> 
> For my sleep job task - I saw allocations worth roughly 130 MB.
> 
> Thanks
> hemanth
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <na...@gmail.com> wrote:
> I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 
> 
> 
> For each and every record Iook in this file which I got from distributed cache. 
> 
> —
> Sent from iPhone
> 
> 
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> 
> Hmm. How are you loading the file into memory ? Is it some sort of memory mapping etc ? Are they being read as records ? Some details of the app will help
> 
> 
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Hemanth,
> 
> I tried out your suggestion loading 420 MB file into memory. It threw java heap space error.
> 
> I am not sure where this 1.6 GB of configured heap went to ?
> 
> 
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> The free memory might be low, just because GC hasn't reclaimed what it can. Can you just try reading in the data you want to read and see if that works ?
> 
> Thanks
> Hemanth
> 
> 
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> io.sort.mb = 256 MB
> 
> 
> On Monday, March 25, 2013, Harsh J wrote:
> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> 
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <na...@gmail.com> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 
> -- 
> Sent from iPhone
> 
> 
> 
> 
> 
> 
> 
> 


Re: Child JVM memory allocation / Usage

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Create a dump.sh on hdfs.

$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof

Run your job with 

-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'

This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.

Koji


On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:

> Hi,
> 
> I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected, the dump goes to the current work directory of the task attempt as it executes on the cluster. This directory is cleaned up once the task is done. There are options to keep failed task files or task files matching a pattern. However, these are NOT retaining the current working directory. Hence, there is no option to get this from a cluster AFAIK.
> 
> You are effectively left with the jmap option on pseudo distributed cluster I think.
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> If your task is running out of memory, you could add the option -XX:+HeapDumpOnOutOfMemoryError 
> to mapred.child.java.opts (along with the heap memory). However, I am not sure  where it stores the dump.. You might need to experiment a little on it.. Will try and send out the info if I get time to try out.
> 
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi hemanth,
> 
> This sounds interesting, will out try out that on the pseudo cluster.  But the real problem for me is, the cluster is being maintained by third party. I only have have a edge node through which I can submit the jobs. 
> 
> Is there any other way of getting the dump instead of physically going to that machine and  checking out. 
> 
> 
> 
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> One option to find what could be taking the memory is to use jmap on the running task. The steps I followed are:
> 
> - I ran a sleep job (which comes in the examples jar of the distribution - effectively does nothing in the mapper / reducer). 
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
> 
> This will give you an idea of the count of objects allocated and size. Jmap also has options to get a dump, that will contain more information, but this should help to get you started with debugging.
> 
> For my sleep job task - I saw allocations worth roughly 130 MB.
> 
> Thanks
> hemanth
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <na...@gmail.com> wrote:
> I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 
> 
> 
> For each and every record Iook in this file which I got from distributed cache. 
> 
> —
> Sent from iPhone
> 
> 
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> 
> Hmm. How are you loading the file into memory ? Is it some sort of memory mapping etc ? Are they being read as records ? Some details of the app will help
> 
> 
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Hemanth,
> 
> I tried out your suggestion loading 420 MB file into memory. It threw java heap space error.
> 
> I am not sure where this 1.6 GB of configured heap went to ?
> 
> 
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> The free memory might be low, just because GC hasn't reclaimed what it can. Can you just try reading in the data you want to read and see if that works ?
> 
> Thanks
> Hemanth
> 
> 
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> io.sort.mb = 256 MB
> 
> 
> On Monday, March 25, 2013, Harsh J wrote:
> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> 
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <na...@gmail.com> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 
> -- 
> Sent from iPhone
> 
> 
> 
> 
> 
> 
> 
> 


Re: Child JVM memory allocation / Usage

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Create a dump.sh on hdfs.

$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof

Run your job with 

-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'

This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.

Koji


On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:

> Hi,
> 
> I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected, the dump goes to the current work directory of the task attempt as it executes on the cluster. This directory is cleaned up once the task is done. There are options to keep failed task files or task files matching a pattern. However, these are NOT retaining the current working directory. Hence, there is no option to get this from a cluster AFAIK.
> 
> You are effectively left with the jmap option on pseudo distributed cluster I think.
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> If your task is running out of memory, you could add the option -XX:+HeapDumpOnOutOfMemoryError 
> to mapred.child.java.opts (along with the heap memory). However, I am not sure  where it stores the dump.. You might need to experiment a little on it.. Will try and send out the info if I get time to try out.
> 
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi hemanth,
> 
> This sounds interesting, will out try out that on the pseudo cluster.  But the real problem for me is, the cluster is being maintained by third party. I only have have a edge node through which I can submit the jobs. 
> 
> Is there any other way of getting the dump instead of physically going to that machine and  checking out. 
> 
> 
> 
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> One option to find what could be taking the memory is to use jmap on the running task. The steps I followed are:
> 
> - I ran a sleep job (which comes in the examples jar of the distribution - effectively does nothing in the mapper / reducer). 
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
> 
> This will give you an idea of the count of objects allocated and size. Jmap also has options to get a dump, that will contain more information, but this should help to get you started with debugging.
> 
> For my sleep job task - I saw allocations worth roughly 130 MB.
> 
> Thanks
> hemanth
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <na...@gmail.com> wrote:
> I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 
> 
> 
> For each and every record Iook in this file which I got from distributed cache. 
> 
> —
> Sent from iPhone
> 
> 
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> 
> Hmm. How are you loading the file into memory ? Is it some sort of memory mapping etc ? Are they being read as records ? Some details of the app will help
> 
> 
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi Hemanth,
> 
> I tried out your suggestion loading 420 MB file into memory. It threw java heap space error.
> 
> I am not sure where this 1.6 GB of configured heap went to ?
> 
> 
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <yh...@thoughtworks.com> wrote:
> Hi,
> 
> The free memory might be low, just because GC hasn't reclaimed what it can. Can you just try reading in the data you want to read and see if that works ?
> 
> Thanks
> Hemanth
> 
> 
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> io.sort.mb = 256 MB
> 
> 
> On Monday, March 25, 2013, Harsh J wrote:
> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> 
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <na...@gmail.com> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 
> -- 
> Sent from iPhone
> 
> 
> 
> 
> 
> 
> 
> 


Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I
suspected, the dump goes to the current work directory of the task attempt
as it executes on the cluster. This directory is cleaned up once the task
is done. There are options to keep failed task files or task files matching
a pattern. However, these are NOT retaining the current working directory.
Hence, there is no option to get this from a cluster AFAIK.

You are effectively left with the jmap option on pseudo distributed cluster
I think.

Thanks
Hemanth


On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> If your task is running out of memory, you could add the option *
> -XX:+HeapDumpOnOutOfMemoryError *
> *to *mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
>
>
> Thanks
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi hemanth,
>>
>> This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>>
>> Is there any other way of getting the dump instead of physically going to
>> that machine and  checking out.
>>
>>
>>
>>  On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> One option to find what could be taking the memory is to use jmap on the
>>> running task. The steps I followed are:
>>>
>>> - I ran a sleep job (which comes in the examples jar of the distribution
>>> - effectively does nothing in the mapper / reducer).
>>> - From the JobTracker UI looked at a map task attempt ID.
>>> - Then on the machine where the map task is running, got the PID of the
>>> running task - ps -ef | grep <task attempt id>
>>> - On the same machine executed jmap -histo <pid>
>>>
>>> This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>>
>>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>>
>>> Thanks
>>> hemanth
>>>
>>>
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> I have a lookup file which I need in the mapper. So I am trying to read
>>>> the whole file and load it into list in the mapper.
>>>>
>>>> For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>>
>>>> —
>>>> Sent from iPhone
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> Hi Hemanth,
>>>>>>
>>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>>> java heap space error.
>>>>>>
>>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>>> that works ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Hemanth
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>>
>>>>>>>> io.sort.mb = 256 MB
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>>
>>>>>>>>> The MapTask may consume some memory of its own as well. What is
>>>>>>>>> your
>>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>>
>>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>>> really read
>>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>>> >
>>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>>> code in the
>>>>>>>>> > configure method of mapper.
>>>>>>>>> >
>>>>>>>>> > @Override
>>>>>>>>> >
>>>>>>>>> > public void configure(JobConf job) {
>>>>>>>>> >
>>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>>> >
>>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>>> >
>>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>>> >
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Surprisingly the output was
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I am just wondering what processes are taking up that extra
>>>>>>>>> 1.6GB of heap
>>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Regards
>>>>>>>>> >
>>>>>>>>> > Nagarjuna K
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from iPhone
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I
suspected, the dump goes to the current work directory of the task attempt
as it executes on the cluster. This directory is cleaned up once the task
is done. There are options to keep failed task files or task files matching
a pattern. However, these are NOT retaining the current working directory.
Hence, there is no option to get this from a cluster AFAIK.

You are effectively left with the jmap option on pseudo distributed cluster
I think.

Thanks
Hemanth


On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> If your task is running out of memory, you could add the option *
> -XX:+HeapDumpOnOutOfMemoryError *
> *to *mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
>
>
> Thanks
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi hemanth,
>>
>> This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>>
>> Is there any other way of getting the dump instead of physically going to
>> that machine and  checking out.
>>
>>
>>
>>  On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> One option to find what could be taking the memory is to use jmap on the
>>> running task. The steps I followed are:
>>>
>>> - I ran a sleep job (which comes in the examples jar of the distribution
>>> - effectively does nothing in the mapper / reducer).
>>> - From the JobTracker UI looked at a map task attempt ID.
>>> - Then on the machine where the map task is running, got the PID of the
>>> running task - ps -ef | grep <task attempt id>
>>> - On the same machine executed jmap -histo <pid>
>>>
>>> This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>>
>>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>>
>>> Thanks
>>> hemanth
>>>
>>>
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> I have a lookup file which I need in the mapper. So I am trying to read
>>>> the whole file and load it into list in the mapper.
>>>>
>>>> For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>>
>>>> —
>>>> Sent from iPhone
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> Hi Hemanth,
>>>>>>
>>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>>> java heap space error.
>>>>>>
>>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>>> that works ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Hemanth
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>>
>>>>>>>> io.sort.mb = 256 MB
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>>
>>>>>>>>> The MapTask may consume some memory of its own as well. What is
>>>>>>>>> your
>>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>>
>>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>>> really read
>>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>>> >
>>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>>> code in the
>>>>>>>>> > configure method of mapper.
>>>>>>>>> >
>>>>>>>>> > @Override
>>>>>>>>> >
>>>>>>>>> > public void configure(JobConf job) {
>>>>>>>>> >
>>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>>> >
>>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>>> >
>>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>>> >
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Surprisingly the output was
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I am just wondering what processes are taking up that extra
>>>>>>>>> 1.6GB of heap
>>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Regards
>>>>>>>>> >
>>>>>>>>> > Nagarjuna K
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from iPhone
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I
suspected, the dump goes to the current work directory of the task attempt
as it executes on the cluster. This directory is cleaned up once the task
is done. There are options to keep failed task files or task files matching
a pattern. However, these are NOT retaining the current working directory.
Hence, there is no option to get this from a cluster AFAIK.

You are effectively left with the jmap option on pseudo distributed cluster
I think.

Thanks
Hemanth


On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> If your task is running out of memory, you could add the option *
> -XX:+HeapDumpOnOutOfMemoryError *
> *to *mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
>
>
> Thanks
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi hemanth,
>>
>> This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>>
>> Is there any other way of getting the dump instead of physically going to
>> that machine and  checking out.
>>
>>
>>
>>  On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> One option to find what could be taking the memory is to use jmap on the
>>> running task. The steps I followed are:
>>>
>>> - I ran a sleep job (which comes in the examples jar of the distribution
>>> - effectively does nothing in the mapper / reducer).
>>> - From the JobTracker UI looked at a map task attempt ID.
>>> - Then on the machine where the map task is running, got the PID of the
>>> running task - ps -ef | grep <task attempt id>
>>> - On the same machine executed jmap -histo <pid>
>>>
>>> This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>>
>>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>>
>>> Thanks
>>> hemanth
>>>
>>>
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> I have a lookup file which I need in the mapper. So I am trying to read
>>>> the whole file and load it into list in the mapper.
>>>>
>>>> For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>>
>>>> —
>>>> Sent from iPhone
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> Hi Hemanth,
>>>>>>
>>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>>> java heap space error.
>>>>>>
>>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>>> that works ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Hemanth
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>>
>>>>>>>> io.sort.mb = 256 MB
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>>
>>>>>>>>> The MapTask may consume some memory of its own as well. What is
>>>>>>>>> your
>>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>>
>>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>>> really read
>>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>>> >
>>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>>> code in the
>>>>>>>>> > configure method of mapper.
>>>>>>>>> >
>>>>>>>>> > @Override
>>>>>>>>> >
>>>>>>>>> > public void configure(JobConf job) {
>>>>>>>>> >
>>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>>> >
>>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>>> >
>>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>>> >
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Surprisingly the output was
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I am just wondering what processes are taking up that extra
>>>>>>>>> 1.6GB of heap
>>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Regards
>>>>>>>>> >
>>>>>>>>> > Nagarjuna K
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from iPhone
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I
suspected, the dump goes to the current work directory of the task attempt
as it executes on the cluster. This directory is cleaned up once the task
is done. There are options to keep failed task files or task files matching
a pattern. However, these are NOT retaining the current working directory.
Hence, there is no option to get this from a cluster AFAIK.

You are effectively left with the jmap option on pseudo distributed cluster
I think.

Thanks
Hemanth


On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> If your task is running out of memory, you could add the option *
> -XX:+HeapDumpOnOutOfMemoryError *
> *to *mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
>
>
> Thanks
> Hemanth
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi hemanth,
>>
>> This sounds interesting, will out try out that on the pseudo cluster.
>>  But the real problem for me is, the cluster is being maintained by third
>> party. I only have have a edge node through which I can submit the jobs.
>>
>> Is there any other way of getting the dump instead of physically going to
>> that machine and  checking out.
>>
>>
>>
>>  On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> One option to find what could be taking the memory is to use jmap on the
>>> running task. The steps I followed are:
>>>
>>> - I ran a sleep job (which comes in the examples jar of the distribution
>>> - effectively does nothing in the mapper / reducer).
>>> - From the JobTracker UI looked at a map task attempt ID.
>>> - Then on the machine where the map task is running, got the PID of the
>>> running task - ps -ef | grep <task attempt id>
>>> - On the same machine executed jmap -histo <pid>
>>>
>>> This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>>
>>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>>
>>> Thanks
>>> hemanth
>>>
>>>
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> I have a lookup file which I need in the mapper. So I am trying to read
>>>> the whole file and load it into list in the mapper.
>>>>
>>>> For each and every record Iook in this file which I got from
>>>> distributed cache.
>>>>
>>>> —
>>>> Sent from iPhone
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>>> app will help
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> Hi Hemanth,
>>>>>>
>>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>>> java heap space error.
>>>>>>
>>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>>> that works ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Hemanth
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>>
>>>>>>>> io.sort.mb = 256 MB
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>>
>>>>>>>>> The MapTask may consume some memory of its own as well. What is
>>>>>>>>> your
>>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>>
>>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>>> really read
>>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>>> >
>>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>>> code in the
>>>>>>>>> > configure method of mapper.
>>>>>>>>> >
>>>>>>>>> > @Override
>>>>>>>>> >
>>>>>>>>> > public void configure(JobConf job) {
>>>>>>>>> >
>>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>>> >
>>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>>> >
>>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>>> >
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Surprisingly the output was
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I am just wondering what processes are taking up that extra
>>>>>>>>> 1.6GB of heap
>>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Regards
>>>>>>>>> >
>>>>>>>>> > Nagarjuna K
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from iPhone
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
If your task is running out of memory, you could add the option *
-XX:+HeapDumpOnOutOfMemoryError *
*to *mapred.child.java.opts (along with the heap memory). However, I am not
sure  where it stores the dump.. You might need to experiment a little on
it.. Will try and send out the info if I get time to try out.


Thanks
Hemanth


On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi hemanth,
>
> This sounds interesting, will out try out that on the pseudo cluster.  But
> the real problem for me is, the cluster is being maintained by third party.
> I only have have a edge node through which I can submit the jobs.
>
> Is there any other way of getting the dump instead of physically going to
> that machine and  checking out.
>
>
>
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> One option to find what could be taking the memory is to use jmap on the
>> running task. The steps I followed are:
>>
>> - I ran a sleep job (which comes in the examples jar of the distribution
>> - effectively does nothing in the mapper / reducer).
>> - From the JobTracker UI looked at a map task attempt ID.
>> - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> - On the same machine executed jmap -histo <pid>
>>
>> This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>>
>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>
>> Thanks
>> hemanth
>>
>>
>>
>>
>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> I have a lookup file which I need in the mapper. So I am trying to read
>>> the whole file and load it into list in the mapper.
>>>
>>> For each and every record Iook in this file which I got from distributed
>>> cache.
>>>
>>> —
>>> Sent from iPhone
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> Hi Hemanth,
>>>>>
>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>> java heap space error.
>>>>>
>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>>
>>>>>> Thanks
>>>>>> Hemanth
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>
>>>>>>> io.sort.mb = 256 MB
>>>>>>>
>>>>>>>
>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>
>>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>
>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>> really read
>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>> >
>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>> code in the
>>>>>>>> > configure method of mapper.
>>>>>>>> >
>>>>>>>> > @Override
>>>>>>>> >
>>>>>>>> > public void configure(JobConf job) {
>>>>>>>> >
>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>> >
>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>> >
>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>> >
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Surprisingly the output was
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>>> of heap
>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Regards
>>>>>>>> >
>>>>>>>> > Nagarjuna K
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
If your task is running out of memory, you could add the option *
-XX:+HeapDumpOnOutOfMemoryError *
*to *mapred.child.java.opts (along with the heap memory). However, I am not
sure  where it stores the dump.. You might need to experiment a little on
it.. Will try and send out the info if I get time to try out.


Thanks
Hemanth


On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi hemanth,
>
> This sounds interesting, will out try out that on the pseudo cluster.  But
> the real problem for me is, the cluster is being maintained by third party.
> I only have have a edge node through which I can submit the jobs.
>
> Is there any other way of getting the dump instead of physically going to
> that machine and  checking out.
>
>
>
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> One option to find what could be taking the memory is to use jmap on the
>> running task. The steps I followed are:
>>
>> - I ran a sleep job (which comes in the examples jar of the distribution
>> - effectively does nothing in the mapper / reducer).
>> - From the JobTracker UI looked at a map task attempt ID.
>> - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> - On the same machine executed jmap -histo <pid>
>>
>> This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>>
>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>
>> Thanks
>> hemanth
>>
>>
>>
>>
>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> I have a lookup file which I need in the mapper. So I am trying to read
>>> the whole file and load it into list in the mapper.
>>>
>>> For each and every record Iook in this file which I got from distributed
>>> cache.
>>>
>>> —
>>> Sent from iPhone
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> Hi Hemanth,
>>>>>
>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>> java heap space error.
>>>>>
>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>>
>>>>>> Thanks
>>>>>> Hemanth
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>
>>>>>>> io.sort.mb = 256 MB
>>>>>>>
>>>>>>>
>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>
>>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>
>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>> really read
>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>> >
>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>> code in the
>>>>>>>> > configure method of mapper.
>>>>>>>> >
>>>>>>>> > @Override
>>>>>>>> >
>>>>>>>> > public void configure(JobConf job) {
>>>>>>>> >
>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>> >
>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>> >
>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>> >
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Surprisingly the output was
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>>> of heap
>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Regards
>>>>>>>> >
>>>>>>>> > Nagarjuna K
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
If your task is running out of memory, you could add the option *
-XX:+HeapDumpOnOutOfMemoryError *
*to *mapred.child.java.opts (along with the heap memory). However, I am not
sure  where it stores the dump.. You might need to experiment a little on
it.. Will try and send out the info if I get time to try out.


Thanks
Hemanth


On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi hemanth,
>
> This sounds interesting, will out try out that on the pseudo cluster.  But
> the real problem for me is, the cluster is being maintained by third party.
> I only have have a edge node through which I can submit the jobs.
>
> Is there any other way of getting the dump instead of physically going to
> that machine and  checking out.
>
>
>
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> One option to find what could be taking the memory is to use jmap on the
>> running task. The steps I followed are:
>>
>> - I ran a sleep job (which comes in the examples jar of the distribution
>> - effectively does nothing in the mapper / reducer).
>> - From the JobTracker UI looked at a map task attempt ID.
>> - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> - On the same machine executed jmap -histo <pid>
>>
>> This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>>
>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>
>> Thanks
>> hemanth
>>
>>
>>
>>
>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> I have a lookup file which I need in the mapper. So I am trying to read
>>> the whole file and load it into list in the mapper.
>>>
>>> For each and every record Iook in this file which I got from distributed
>>> cache.
>>>
>>> —
>>> Sent from iPhone
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> Hi Hemanth,
>>>>>
>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>> java heap space error.
>>>>>
>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>>
>>>>>> Thanks
>>>>>> Hemanth
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>
>>>>>>> io.sort.mb = 256 MB
>>>>>>>
>>>>>>>
>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>
>>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>
>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>> really read
>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>> >
>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>> code in the
>>>>>>>> > configure method of mapper.
>>>>>>>> >
>>>>>>>> > @Override
>>>>>>>> >
>>>>>>>> > public void configure(JobConf job) {
>>>>>>>> >
>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>> >
>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>> >
>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>> >
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Surprisingly the output was
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>>> of heap
>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Regards
>>>>>>>> >
>>>>>>>> > Nagarjuna K
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
If your task is running out of memory, you could add the option *
-XX:+HeapDumpOnOutOfMemoryError *
*to *mapred.child.java.opts (along with the heap memory). However, I am not
sure  where it stores the dump.. You might need to experiment a little on
it.. Will try and send out the info if I get time to try out.


Thanks
Hemanth


On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi hemanth,
>
> This sounds interesting, will out try out that on the pseudo cluster.  But
> the real problem for me is, the cluster is being maintained by third party.
> I only have have a edge node through which I can submit the jobs.
>
> Is there any other way of getting the dump instead of physically going to
> that machine and  checking out.
>
>
>
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> One option to find what could be taking the memory is to use jmap on the
>> running task. The steps I followed are:
>>
>> - I ran a sleep job (which comes in the examples jar of the distribution
>> - effectively does nothing in the mapper / reducer).
>> - From the JobTracker UI looked at a map task attempt ID.
>> - Then on the machine where the map task is running, got the PID of the
>> running task - ps -ef | grep <task attempt id>
>> - On the same machine executed jmap -histo <pid>
>>
>> This will give you an idea of the count of objects allocated and size.
>> Jmap also has options to get a dump, that will contain more information,
>> but this should help to get you started with debugging.
>>
>> For my sleep job task - I saw allocations worth roughly 130 MB.
>>
>> Thanks
>> hemanth
>>
>>
>>
>>
>> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> I have a lookup file which I need in the mapper. So I am trying to read
>>> the whole file and load it into list in the mapper.
>>>
>>> For each and every record Iook in this file which I got from distributed
>>> cache.
>>>
>>> —
>>> Sent from iPhone
>>>
>>>
>>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>>> memory mapping etc ? Are they being read as records ? Some details of the
>>>> app will help
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> Hi Hemanth,
>>>>>
>>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>>> java heap space error.
>>>>>
>>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The free memory might be low, just because GC hasn't reclaimed what
>>>>>> it can. Can you just try reading in the data you want to read and see if
>>>>>> that works ?
>>>>>>
>>>>>> Thanks
>>>>>> Hemanth
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>>
>>>>>>> io.sort.mb = 256 MB
>>>>>>>
>>>>>>>
>>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>>
>>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>>
>>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>>> <na...@gmail.com> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>>> really read
>>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>>> >
>>>>>>>> > I wanted to confirm the same and wrote the following piece of
>>>>>>>> code in the
>>>>>>>> > configure method of mapper.
>>>>>>>> >
>>>>>>>> > @Override
>>>>>>>> >
>>>>>>>> > public void configure(JobConf job) {
>>>>>>>> >
>>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>>> >
>>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>>> >
>>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>>> >
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Surprisingly the output was
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>>> of heap
>>>>>>>> > which I configured for the child jvm heap.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Regards
>>>>>>>> >
>>>>>>>> > Nagarjuna K
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi hemanth,

This sounds interesting, will out try out that on the pseudo cluster.  But
the real problem for me is, the cluster is being maintained by third party.
I only have have a edge node through which I can submit the jobs.

Is there any other way of getting the dump instead of physically going to
that machine and  checking out.



On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
>
> - I ran a sleep job (which comes in the examples jar of the distribution -
> effectively does nothing in the mapper / reducer).
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
>
> This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
>
> For my sleep job task - I saw allocations worth roughly 130 MB.
>
> Thanks
> hemanth
>
>
>
>
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>>
>> For each and every record Iook in this file which I got from distributed
>> cache.
>>
>> —
>> Sent from iPhone
>>
>>
>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>>
>>>
>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth,
>>>>
>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>>
>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>>> can. Can you just try reading in the data you want to read and see if that
>>>>> works ?
>>>>>
>>>>> Thanks
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> io.sort.mb = 256 MB
>>>>>>
>>>>>>
>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>
>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>
>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>> <na...@gmail.com> wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>> really read
>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>> >
>>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>>> in the
>>>>>>> > configure method of mapper.
>>>>>>> >
>>>>>>> > @Override
>>>>>>> >
>>>>>>> > public void configure(JobConf job) {
>>>>>>> >
>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>> >
>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>> >
>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>> >
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > Surprisingly the output was
>>>>>>> >
>>>>>>> >
>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>> >
>>>>>>> >
>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>> of heap
>>>>>>> > which I configured for the child jvm heap.
>>>>>>> >
>>>>>>> >
>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Regards
>>>>>>> >
>>>>>>> > Nagarjuna K
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from iPhone
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi hemanth,

This sounds interesting, will out try out that on the pseudo cluster.  But
the real problem for me is, the cluster is being maintained by third party.
I only have have a edge node through which I can submit the jobs.

Is there any other way of getting the dump instead of physically going to
that machine and  checking out.



On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
>
> - I ran a sleep job (which comes in the examples jar of the distribution -
> effectively does nothing in the mapper / reducer).
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
>
> This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
>
> For my sleep job task - I saw allocations worth roughly 130 MB.
>
> Thanks
> hemanth
>
>
>
>
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>>
>> For each and every record Iook in this file which I got from distributed
>> cache.
>>
>> —
>> Sent from iPhone
>>
>>
>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>>
>>>
>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth,
>>>>
>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>>
>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>>> can. Can you just try reading in the data you want to read and see if that
>>>>> works ?
>>>>>
>>>>> Thanks
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> io.sort.mb = 256 MB
>>>>>>
>>>>>>
>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>
>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>
>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>> <na...@gmail.com> wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>> really read
>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>> >
>>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>>> in the
>>>>>>> > configure method of mapper.
>>>>>>> >
>>>>>>> > @Override
>>>>>>> >
>>>>>>> > public void configure(JobConf job) {
>>>>>>> >
>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>> >
>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>> >
>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>> >
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > Surprisingly the output was
>>>>>>> >
>>>>>>> >
>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>> >
>>>>>>> >
>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>> of heap
>>>>>>> > which I configured for the child jvm heap.
>>>>>>> >
>>>>>>> >
>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Regards
>>>>>>> >
>>>>>>> > Nagarjuna K
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from iPhone
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi hemanth,

This sounds interesting, will out try out that on the pseudo cluster.  But
the real problem for me is, the cluster is being maintained by third party.
I only have have a edge node through which I can submit the jobs.

Is there any other way of getting the dump instead of physically going to
that machine and  checking out.



On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
>
> - I ran a sleep job (which comes in the examples jar of the distribution -
> effectively does nothing in the mapper / reducer).
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
>
> This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
>
> For my sleep job task - I saw allocations worth roughly 130 MB.
>
> Thanks
> hemanth
>
>
>
>
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>>
>> For each and every record Iook in this file which I got from distributed
>> cache.
>>
>> —
>> Sent from iPhone
>>
>>
>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>>
>>>
>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth,
>>>>
>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>>
>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>>> can. Can you just try reading in the data you want to read and see if that
>>>>> works ?
>>>>>
>>>>> Thanks
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> io.sort.mb = 256 MB
>>>>>>
>>>>>>
>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>
>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>
>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>> <na...@gmail.com> wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>> really read
>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>> >
>>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>>> in the
>>>>>>> > configure method of mapper.
>>>>>>> >
>>>>>>> > @Override
>>>>>>> >
>>>>>>> > public void configure(JobConf job) {
>>>>>>> >
>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>> >
>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>> >
>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>> >
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > Surprisingly the output was
>>>>>>> >
>>>>>>> >
>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>> >
>>>>>>> >
>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>> of heap
>>>>>>> > which I configured for the child jvm heap.
>>>>>>> >
>>>>>>> >
>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Regards
>>>>>>> >
>>>>>>> > Nagarjuna K
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from iPhone
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi hemanth,

This sounds interesting, will out try out that on the pseudo cluster.  But
the real problem for me is, the cluster is being maintained by third party.
I only have have a edge node through which I can submit the jobs.

Is there any other way of getting the dump instead of physically going to
that machine and  checking out.



On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
>
> - I ran a sleep job (which comes in the examples jar of the distribution -
> effectively does nothing in the mapper / reducer).
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
>
> This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
>
> For my sleep job task - I saw allocations worth roughly 130 MB.
>
> Thanks
> hemanth
>
>
>
>
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> I have a lookup file which I need in the mapper. So I am trying to read
>> the whole file and load it into list in the mapper.
>>
>> For each and every record Iook in this file which I got from distributed
>> cache.
>>
>> —
>> Sent from iPhone
>>
>>
>> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>>
>>>
>>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth,
>>>>
>>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>>> java heap space error.
>>>>
>>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>>> can. Can you just try reading in the data you want to read and see if that
>>>>> works ?
>>>>>
>>>>> Thanks
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>
>>>>>> io.sort.mb = 256 MB
>>>>>>
>>>>>>
>>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>>
>>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>>
>>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>>> <na...@gmail.com> wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>>> really read
>>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>>> >
>>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>>> in the
>>>>>>> > configure method of mapper.
>>>>>>> >
>>>>>>> > @Override
>>>>>>> >
>>>>>>> > public void configure(JobConf job) {
>>>>>>> >
>>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>>> >
>>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>>> >
>>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>>> Runtime.getRuntime().maxMemory());
>>>>>>> >
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > Surprisingly the output was
>>>>>>> >
>>>>>>> >
>>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>>> >
>>>>>>> >
>>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>>> of heap
>>>>>>> > which I configured for the child jvm heap.
>>>>>>> >
>>>>>>> >
>>>>>>> > Appreciate in helping me understand the scenario.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Regards
>>>>>>> >
>>>>>>> > Nagarjuna K
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from iPhone
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

One option to find what could be taking the memory is to use jmap on the
running task. The steps I followed are:

- I ran a sleep job (which comes in the examples jar of the distribution -
effectively does nothing in the mapper / reducer).
- From the JobTracker UI looked at a map task attempt ID.
- Then on the machine where the map task is running, got the PID of the
running task - ps -ef | grep <task attempt id>
- On the same machine executed jmap -histo <pid>

This will give you an idea of the count of objects allocated and size. Jmap
also has options to get a dump, that will contain more information, but
this should help to get you started with debugging.

For my sleep job task - I saw allocations worth roughly 130 MB.

Thanks
hemanth




On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
>
> For each and every record Iook in this file which I got from distributed
> cache.
>
> —
> Sent from iPhone
>
>
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hmm. How are you loading the file into memory ? Is it some sort of memory
>> mapping etc ? Are they being read as records ? Some details of the app will
>> help
>>
>>
>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth,
>>>
>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>>
>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>
>>>
>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>> can. Can you just try reading in the data you want to read and see if that
>>>> works ?
>>>>
>>>> Thanks
>>>> Hemanth
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> io.sort.mb = 256 MB
>>>>>
>>>>>
>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>
>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>
>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> <na...@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> >
>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>> in the
>>>>>> > configure method of mapper.
>>>>>> >
>>>>>> > @Override
>>>>>> >
>>>>>> > public void configure(JobConf job) {
>>>>>> >
>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>> >
>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>> >
>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> >
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > Surprisingly the output was
>>>>>> >
>>>>>> >
>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> >
>>>>>> >
>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > which I configured for the child jvm heap.
>>>>>> >
>>>>>> >
>>>>>> > Appreciate in helping me understand the scenario.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Regards
>>>>>> >
>>>>>> > Nagarjuna K
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from iPhone
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

One option to find what could be taking the memory is to use jmap on the
running task. The steps I followed are:

- I ran a sleep job (which comes in the examples jar of the distribution -
effectively does nothing in the mapper / reducer).
- From the JobTracker UI looked at a map task attempt ID.
- Then on the machine where the map task is running, got the PID of the
running task - ps -ef | grep <task attempt id>
- On the same machine executed jmap -histo <pid>

This will give you an idea of the count of objects allocated and size. Jmap
also has options to get a dump, that will contain more information, but
this should help to get you started with debugging.

For my sleep job task - I saw allocations worth roughly 130 MB.

Thanks
hemanth




On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
>
> For each and every record Iook in this file which I got from distributed
> cache.
>
> —
> Sent from iPhone
>
>
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hmm. How are you loading the file into memory ? Is it some sort of memory
>> mapping etc ? Are they being read as records ? Some details of the app will
>> help
>>
>>
>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth,
>>>
>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>>
>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>
>>>
>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>> can. Can you just try reading in the data you want to read and see if that
>>>> works ?
>>>>
>>>> Thanks
>>>> Hemanth
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> io.sort.mb = 256 MB
>>>>>
>>>>>
>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>
>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>
>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> <na...@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> >
>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>> in the
>>>>>> > configure method of mapper.
>>>>>> >
>>>>>> > @Override
>>>>>> >
>>>>>> > public void configure(JobConf job) {
>>>>>> >
>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>> >
>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>> >
>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> >
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > Surprisingly the output was
>>>>>> >
>>>>>> >
>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> >
>>>>>> >
>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > which I configured for the child jvm heap.
>>>>>> >
>>>>>> >
>>>>>> > Appreciate in helping me understand the scenario.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Regards
>>>>>> >
>>>>>> > Nagarjuna K
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from iPhone
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

One option to find what could be taking the memory is to use jmap on the
running task. The steps I followed are:

- I ran a sleep job (which comes in the examples jar of the distribution -
effectively does nothing in the mapper / reducer).
- From the JobTracker UI looked at a map task attempt ID.
- Then on the machine where the map task is running, got the PID of the
running task - ps -ef | grep <task attempt id>
- On the same machine executed jmap -histo <pid>

This will give you an idea of the count of objects allocated and size. Jmap
also has options to get a dump, that will contain more information, but
this should help to get you started with debugging.

For my sleep job task - I saw allocations worth roughly 130 MB.

Thanks
hemanth




On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
>
> For each and every record Iook in this file which I got from distributed
> cache.
>
> —
> Sent from iPhone
>
>
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hmm. How are you loading the file into memory ? Is it some sort of memory
>> mapping etc ? Are they being read as records ? Some details of the app will
>> help
>>
>>
>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth,
>>>
>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>>
>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>
>>>
>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>> can. Can you just try reading in the data you want to read and see if that
>>>> works ?
>>>>
>>>> Thanks
>>>> Hemanth
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> io.sort.mb = 256 MB
>>>>>
>>>>>
>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>
>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>
>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> <na...@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> >
>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>> in the
>>>>>> > configure method of mapper.
>>>>>> >
>>>>>> > @Override
>>>>>> >
>>>>>> > public void configure(JobConf job) {
>>>>>> >
>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>> >
>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>> >
>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> >
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > Surprisingly the output was
>>>>>> >
>>>>>> >
>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> >
>>>>>> >
>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > which I configured for the child jvm heap.
>>>>>> >
>>>>>> >
>>>>>> > Appreciate in helping me understand the scenario.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Regards
>>>>>> >
>>>>>> > Nagarjuna K
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from iPhone
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

One option to find what could be taking the memory is to use jmap on the
running task. The steps I followed are:

- I ran a sleep job (which comes in the examples jar of the distribution -
effectively does nothing in the mapper / reducer).
- From the JobTracker UI looked at a map task attempt ID.
- Then on the machine where the map task is running, got the PID of the
running task - ps -ef | grep <task attempt id>
- On the same machine executed jmap -histo <pid>

This will give you an idea of the count of objects allocated and size. Jmap
also has options to get a dump, that will contain more information, but
this should help to get you started with debugging.

For my sleep job task - I saw allocations worth roughly 130 MB.

Thanks
hemanth




On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
>
> For each and every record Iook in this file which I got from distributed
> cache.
>
> —
> Sent from iPhone
>
>
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hmm. How are you loading the file into memory ? Is it some sort of memory
>> mapping etc ? Are they being read as records ? Some details of the app will
>> help
>>
>>
>> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> Hi Hemanth,
>>>
>>> I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>>
>>> I am not sure where this 1.6 GB of configured heap went to ?
>>>
>>>
>>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The free memory might be low, just because GC hasn't reclaimed what it
>>>> can. Can you just try reading in the data you want to read and see if that
>>>> works ?
>>>>
>>>> Thanks
>>>> Hemanth
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>
>>>>> io.sort.mb = 256 MB
>>>>>
>>>>>
>>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>>
>>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>>
>>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> <na...@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could
>>>>>> really read
>>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> >
>>>>>> > I wanted to confirm the same and wrote the following piece of code
>>>>>> in the
>>>>>> > configure method of mapper.
>>>>>> >
>>>>>> > @Override
>>>>>> >
>>>>>> > public void configure(JobConf job) {
>>>>>> >
>>>>>> > System.out.println("FREE MEMORY -- "
>>>>>> >
>>>>>> > + Runtime.getRuntime().freeMemory());
>>>>>> >
>>>>>> > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> >
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > Surprisingly the output was
>>>>>> >
>>>>>> >
>>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> >
>>>>>> >
>>>>>> > I am just wondering what processes are taking up that extra 1.6GB
>>>>>> of heap
>>>>>> > which I configured for the child jvm heap.
>>>>>> >
>>>>>> >
>>>>>> > Appreciate in helping me understand the scenario.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Regards
>>>>>> >
>>>>>> > Nagarjuna K
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from iPhone
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Nagarjuna Kanamarlapudi <na...@gmail.com>.
I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 


For each and every record Iook in this file which I got from distributed cache. 


—
Sent from  iPhone

On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala
<yh...@thoughtworks.com> wrote:

> Hmm. How are you loading the file into memory ? Is it some sort of memory
> mapping etc ? Are they being read as records ? Some details of the app will
> help
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> Hi Hemanth,
>>
>> I tried out your suggestion loading 420 MB file into memory. It threw java
>> heap space error.
>>
>> I am not sure where this 1.6 GB of configured heap went to ?
>>
>>
>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> io.sort.mb = 256 MB
>>>>
>>>>
>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>
>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>
>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> <na...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>>> read
>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> >
>>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>>> the
>>>>> > configure method of mapper.
>>>>> >
>>>>> > @Override
>>>>> >
>>>>> > public void configure(JobConf job) {
>>>>> >
>>>>> > System.out.println("FREE MEMORY -- "
>>>>> >
>>>>> > + Runtime.getRuntime().freeMemory());
>>>>> >
>>>>> > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> >
>>>>> > }
>>>>> >
>>>>> >
>>>>> > Surprisingly the output was
>>>>> >
>>>>> >
>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> >
>>>>> >
>>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>>> heap
>>>>> > which I configured for the child jvm heap.
>>>>> >
>>>>> >
>>>>> > Appreciate in helping me understand the scenario.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Regards
>>>>> >
>>>>> > Nagarjuna K
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent from iPhone
>>>>
>>>
>>>
>>

Re: Child JVM memory allocation / Usage

Posted by Nagarjuna Kanamarlapudi <na...@gmail.com>.
I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 


For each and every record Iook in this file which I got from distributed cache. 


—
Sent from  iPhone

On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala
<yh...@thoughtworks.com> wrote:

> Hmm. How are you loading the file into memory ? Is it some sort of memory
> mapping etc ? Are they being read as records ? Some details of the app will
> help
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> Hi Hemanth,
>>
>> I tried out your suggestion loading 420 MB file into memory. It threw java
>> heap space error.
>>
>> I am not sure where this 1.6 GB of configured heap went to ?
>>
>>
>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> io.sort.mb = 256 MB
>>>>
>>>>
>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>
>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>
>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> <na...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>>> read
>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> >
>>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>>> the
>>>>> > configure method of mapper.
>>>>> >
>>>>> > @Override
>>>>> >
>>>>> > public void configure(JobConf job) {
>>>>> >
>>>>> > System.out.println("FREE MEMORY -- "
>>>>> >
>>>>> > + Runtime.getRuntime().freeMemory());
>>>>> >
>>>>> > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> >
>>>>> > }
>>>>> >
>>>>> >
>>>>> > Surprisingly the output was
>>>>> >
>>>>> >
>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> >
>>>>> >
>>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>>> heap
>>>>> > which I configured for the child jvm heap.
>>>>> >
>>>>> >
>>>>> > Appreciate in helping me understand the scenario.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Regards
>>>>> >
>>>>> > Nagarjuna K
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent from iPhone
>>>>
>>>
>>>
>>

Re: Child JVM memory allocation / Usage

Posted by Nagarjuna Kanamarlapudi <na...@gmail.com>.
I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 


For each and every record Iook in this file which I got from distributed cache. 


—
Sent from  iPhone

On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala
<yh...@thoughtworks.com> wrote:

> Hmm. How are you loading the file into memory ? Is it some sort of memory
> mapping etc ? Are they being read as records ? Some details of the app will
> help
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> Hi Hemanth,
>>
>> I tried out your suggestion loading 420 MB file into memory. It threw java
>> heap space error.
>>
>> I am not sure where this 1.6 GB of configured heap went to ?
>>
>>
>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> io.sort.mb = 256 MB
>>>>
>>>>
>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>
>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>
>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> <na...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>>> read
>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> >
>>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>>> the
>>>>> > configure method of mapper.
>>>>> >
>>>>> > @Override
>>>>> >
>>>>> > public void configure(JobConf job) {
>>>>> >
>>>>> > System.out.println("FREE MEMORY -- "
>>>>> >
>>>>> > + Runtime.getRuntime().freeMemory());
>>>>> >
>>>>> > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> >
>>>>> > }
>>>>> >
>>>>> >
>>>>> > Surprisingly the output was
>>>>> >
>>>>> >
>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> >
>>>>> >
>>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>>> heap
>>>>> > which I configured for the child jvm heap.
>>>>> >
>>>>> >
>>>>> > Appreciate in helping me understand the scenario.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Regards
>>>>> >
>>>>> > Nagarjuna K
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent from iPhone
>>>>
>>>
>>>
>>

Re: Child JVM memory allocation / Usage

Posted by Nagarjuna Kanamarlapudi <na...@gmail.com>.
I have a lookup file which I need in the mapper. So I am trying to read the whole file and load it into list in the mapper. 


For each and every record Iook in this file which I got from distributed cache. 


—
Sent from  iPhone

On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala
<yh...@thoughtworks.com> wrote:

> Hmm. How are you loading the file into memory ? Is it some sort of memory
> mapping etc ? Are they being read as records ? Some details of the app will
> help
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>> Hi Hemanth,
>>
>> I tried out your suggestion loading 420 MB file into memory. It threw java
>> heap space error.
>>
>> I am not sure where this 1.6 GB of configured heap went to ?
>>
>>
>> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Hi,
>>>
>>> The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> io.sort.mb = 256 MB
>>>>
>>>>
>>>> On Monday, March 25, 2013, Harsh J wrote:
>>>>
>>>>> The MapTask may consume some memory of its own as well. What is your
>>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>
>>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>> <na...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>>> read
>>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>>> >
>>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>>> the
>>>>> > configure method of mapper.
>>>>> >
>>>>> > @Override
>>>>> >
>>>>> > public void configure(JobConf job) {
>>>>> >
>>>>> > System.out.println("FREE MEMORY -- "
>>>>> >
>>>>> > + Runtime.getRuntime().freeMemory());
>>>>> >
>>>>> > System.out.println("MAX MEMORY ---" +
>>>>> Runtime.getRuntime().maxMemory());
>>>>> >
>>>>> > }
>>>>> >
>>>>> >
>>>>> > Surprisingly the output was
>>>>> >
>>>>> >
>>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>>> >
>>>>> >
>>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>>> heap
>>>>> > which I configured for the child jvm heap.
>>>>> >
>>>>> >
>>>>> > Appreciate in helping me understand the scenario.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Regards
>>>>> >
>>>>> > Nagarjuna K
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent from iPhone
>>>>
>>>
>>>
>>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hmm. How are you loading the file into memory ? Is it some sort of memory
mapping etc ? Are they being read as records ? Some details of the app will
help


On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> I tried out your suggestion loading 420 MB file into memory. It threw java
> heap space error.
>
> I am not sure where this 1.6 GB of configured heap went to ?
>
>
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> io.sort.mb = 256 MB
>>>
>>>
>>> On Monday, March 25, 2013, Harsh J wrote:
>>>
>>>> The MapTask may consume some memory of its own as well. What is your
>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>
>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> <na...@gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>> read
>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>> >
>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>> the
>>>> > configure method of mapper.
>>>> >
>>>> > @Override
>>>> >
>>>> > public void configure(JobConf job) {
>>>> >
>>>> > System.out.println("FREE MEMORY -- "
>>>> >
>>>> > + Runtime.getRuntime().freeMemory());
>>>> >
>>>> > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > Surprisingly the output was
>>>> >
>>>> >
>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>> >
>>>> >
>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>> heap
>>>> > which I configured for the child jvm heap.
>>>> >
>>>> >
>>>> > Appreciate in helping me understand the scenario.
>>>> >
>>>> >
>>>> >
>>>> > Regards
>>>> >
>>>> > Nagarjuna K
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>> --
>>> Sent from iPhone
>>>
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hmm. How are you loading the file into memory ? Is it some sort of memory
mapping etc ? Are they being read as records ? Some details of the app will
help


On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> I tried out your suggestion loading 420 MB file into memory. It threw java
> heap space error.
>
> I am not sure where this 1.6 GB of configured heap went to ?
>
>
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> io.sort.mb = 256 MB
>>>
>>>
>>> On Monday, March 25, 2013, Harsh J wrote:
>>>
>>>> The MapTask may consume some memory of its own as well. What is your
>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>
>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> <na...@gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>> read
>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>> >
>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>> the
>>>> > configure method of mapper.
>>>> >
>>>> > @Override
>>>> >
>>>> > public void configure(JobConf job) {
>>>> >
>>>> > System.out.println("FREE MEMORY -- "
>>>> >
>>>> > + Runtime.getRuntime().freeMemory());
>>>> >
>>>> > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > Surprisingly the output was
>>>> >
>>>> >
>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>> >
>>>> >
>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>> heap
>>>> > which I configured for the child jvm heap.
>>>> >
>>>> >
>>>> > Appreciate in helping me understand the scenario.
>>>> >
>>>> >
>>>> >
>>>> > Regards
>>>> >
>>>> > Nagarjuna K
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>> --
>>> Sent from iPhone
>>>
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hmm. How are you loading the file into memory ? Is it some sort of memory
mapping etc ? Are they being read as records ? Some details of the app will
help


On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> I tried out your suggestion loading 420 MB file into memory. It threw java
> heap space error.
>
> I am not sure where this 1.6 GB of configured heap went to ?
>
>
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> io.sort.mb = 256 MB
>>>
>>>
>>> On Monday, March 25, 2013, Harsh J wrote:
>>>
>>>> The MapTask may consume some memory of its own as well. What is your
>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>
>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> <na...@gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>> read
>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>> >
>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>> the
>>>> > configure method of mapper.
>>>> >
>>>> > @Override
>>>> >
>>>> > public void configure(JobConf job) {
>>>> >
>>>> > System.out.println("FREE MEMORY -- "
>>>> >
>>>> > + Runtime.getRuntime().freeMemory());
>>>> >
>>>> > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > Surprisingly the output was
>>>> >
>>>> >
>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>> >
>>>> >
>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>> heap
>>>> > which I configured for the child jvm heap.
>>>> >
>>>> >
>>>> > Appreciate in helping me understand the scenario.
>>>> >
>>>> >
>>>> >
>>>> > Regards
>>>> >
>>>> > Nagarjuna K
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>> --
>>> Sent from iPhone
>>>
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hmm. How are you loading the file into memory ? Is it some sort of memory
mapping etc ? Are they being read as records ? Some details of the app will
help


On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth,
>
> I tried out your suggestion loading 420 MB file into memory. It threw java
> heap space error.
>
> I am not sure where this 1.6 GB of configured heap went to ?
>
>
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> The free memory might be low, just because GC hasn't reclaimed what it
>> can. Can you just try reading in the data you want to read and see if that
>> works ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>>> io.sort.mb = 256 MB
>>>
>>>
>>> On Monday, March 25, 2013, Harsh J wrote:
>>>
>>>> The MapTask may consume some memory of its own as well. What is your
>>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>
>>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>> <na...@gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>>> read
>>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>>> >
>>>> > I wanted to confirm the same and wrote the following piece of code in
>>>> the
>>>> > configure method of mapper.
>>>> >
>>>> > @Override
>>>> >
>>>> > public void configure(JobConf job) {
>>>> >
>>>> > System.out.println("FREE MEMORY -- "
>>>> >
>>>> > + Runtime.getRuntime().freeMemory());
>>>> >
>>>> > System.out.println("MAX MEMORY ---" +
>>>> Runtime.getRuntime().maxMemory());
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > Surprisingly the output was
>>>> >
>>>> >
>>>> > FREE MEMORY -- 341854864  = 320 MB
>>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>>> >
>>>> >
>>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>>> heap
>>>> > which I configured for the child jvm heap.
>>>> >
>>>> >
>>>> > Appreciate in helping me understand the scenario.
>>>> >
>>>> >
>>>> >
>>>> > Regards
>>>> >
>>>> > Nagarjuna K
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>> --
>>> Sent from iPhone
>>>
>>
>>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

I tried out your suggestion loading 420 MB file into memory. It threw java
heap space error.

I am not sure where this 1.6 GB of configured heap went to ?


On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
>
> Thanks
> Hemanth
>
>
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> io.sort.mb = 256 MB
>>
>>
>> On Monday, March 25, 2013, Harsh J wrote:
>>
>>> The MapTask may consume some memory of its own as well. What is your
>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>
>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> <na...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>> read
>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>> >
>>> > I wanted to confirm the same and wrote the following piece of code in
>>> the
>>> > configure method of mapper.
>>> >
>>> > @Override
>>> >
>>> > public void configure(JobConf job) {
>>> >
>>> > System.out.println("FREE MEMORY -- "
>>> >
>>> > + Runtime.getRuntime().freeMemory());
>>> >
>>> > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> >
>>> > }
>>> >
>>> >
>>> > Surprisingly the output was
>>> >
>>> >
>>> > FREE MEMORY -- 341854864  = 320 MB
>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>> >
>>> >
>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > which I configured for the child jvm heap.
>>> >
>>> >
>>> > Appreciate in helping me understand the scenario.
>>> >
>>> >
>>> >
>>> > Regards
>>> >
>>> > Nagarjuna K
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Sent from iPhone
>>
>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

I tried out your suggestion loading 420 MB file into memory. It threw java
heap space error.

I am not sure where this 1.6 GB of configured heap went to ?


On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
>
> Thanks
> Hemanth
>
>
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> io.sort.mb = 256 MB
>>
>>
>> On Monday, March 25, 2013, Harsh J wrote:
>>
>>> The MapTask may consume some memory of its own as well. What is your
>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>
>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> <na...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>> read
>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>> >
>>> > I wanted to confirm the same and wrote the following piece of code in
>>> the
>>> > configure method of mapper.
>>> >
>>> > @Override
>>> >
>>> > public void configure(JobConf job) {
>>> >
>>> > System.out.println("FREE MEMORY -- "
>>> >
>>> > + Runtime.getRuntime().freeMemory());
>>> >
>>> > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> >
>>> > }
>>> >
>>> >
>>> > Surprisingly the output was
>>> >
>>> >
>>> > FREE MEMORY -- 341854864  = 320 MB
>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>> >
>>> >
>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > which I configured for the child jvm heap.
>>> >
>>> >
>>> > Appreciate in helping me understand the scenario.
>>> >
>>> >
>>> >
>>> > Regards
>>> >
>>> > Nagarjuna K
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Sent from iPhone
>>
>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

I tried out your suggestion loading 420 MB file into memory. It threw java
heap space error.

I am not sure where this 1.6 GB of configured heap went to ?


On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
>
> Thanks
> Hemanth
>
>
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> io.sort.mb = 256 MB
>>
>>
>> On Monday, March 25, 2013, Harsh J wrote:
>>
>>> The MapTask may consume some memory of its own as well. What is your
>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>
>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> <na...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>> read
>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>> >
>>> > I wanted to confirm the same and wrote the following piece of code in
>>> the
>>> > configure method of mapper.
>>> >
>>> > @Override
>>> >
>>> > public void configure(JobConf job) {
>>> >
>>> > System.out.println("FREE MEMORY -- "
>>> >
>>> > + Runtime.getRuntime().freeMemory());
>>> >
>>> > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> >
>>> > }
>>> >
>>> >
>>> > Surprisingly the output was
>>> >
>>> >
>>> > FREE MEMORY -- 341854864  = 320 MB
>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>> >
>>> >
>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > which I configured for the child jvm heap.
>>> >
>>> >
>>> > Appreciate in helping me understand the scenario.
>>> >
>>> >
>>> >
>>> > Regards
>>> >
>>> > Nagarjuna K
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Sent from iPhone
>>
>
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
Hi Hemanth,

I tried out your suggestion loading 420 MB file into memory. It threw java
heap space error.

I am not sure where this 1.6 GB of configured heap went to ?


On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi,
>
> The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
>
> Thanks
> Hemanth
>
>
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> io.sort.mb = 256 MB
>>
>>
>> On Monday, March 25, 2013, Harsh J wrote:
>>
>>> The MapTask may consume some memory of its own as well. What is your
>>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>
>>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> <na...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>>> read
>>> > 1.5GB of data and store it in memory (mapper/reducer).
>>> >
>>> > I wanted to confirm the same and wrote the following piece of code in
>>> the
>>> > configure method of mapper.
>>> >
>>> > @Override
>>> >
>>> > public void configure(JobConf job) {
>>> >
>>> > System.out.println("FREE MEMORY -- "
>>> >
>>> > + Runtime.getRuntime().freeMemory());
>>> >
>>> > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> >
>>> > }
>>> >
>>> >
>>> > Surprisingly the output was
>>> >
>>> >
>>> > FREE MEMORY -- 341854864  = 320 MB
>>> > MAX MEMORY ---1908932608  = 1.9 GB
>>> >
>>> >
>>> > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > which I configured for the child jvm heap.
>>> >
>>> >
>>> > Appreciate in helping me understand the scenario.
>>> >
>>> >
>>> >
>>> > Regards
>>> >
>>> > Nagarjuna K
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Sent from iPhone
>>
>
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The free memory might be low, just because GC hasn't reclaimed what it can.
Can you just try reading in the data you want to read and see if that works
?

Thanks
Hemanth


On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> io.sort.mb = 256 MB
>
>
> On Monday, March 25, 2013, Harsh J wrote:
>
>> The MapTask may consume some memory of its own as well. What is your
>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>
>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> <na...@gmail.com> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Sent from iPhone
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The free memory might be low, just because GC hasn't reclaimed what it can.
Can you just try reading in the data you want to read and see if that works
?

Thanks
Hemanth


On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> io.sort.mb = 256 MB
>
>
> On Monday, March 25, 2013, Harsh J wrote:
>
>> The MapTask may consume some memory of its own as well. What is your
>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>
>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> <na...@gmail.com> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Sent from iPhone
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The free memory might be low, just because GC hasn't reclaimed what it can.
Can you just try reading in the data you want to read and see if that works
?

Thanks
Hemanth


On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> io.sort.mb = 256 MB
>
>
> On Monday, March 25, 2013, Harsh J wrote:
>
>> The MapTask may consume some memory of its own as well. What is your
>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>
>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> <na...@gmail.com> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Sent from iPhone
>

Re: Child JVM memory allocation / Usage

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

The free memory might be low, just because GC hasn't reclaimed what it can.
Can you just try reading in the data you want to read and see if that works
?

Thanks
Hemanth


On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> io.sort.mb = 256 MB
>
>
> On Monday, March 25, 2013, Harsh J wrote:
>
>> The MapTask may consume some memory of its own as well. What is your
>> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>
>> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>> <na...@gmail.com> wrote:
>> > Hi,
>> >
>> > I configured  my child jvm heap to 2 GB. So, I thought I could really
>> read
>> > 1.5GB of data and store it in memory (mapper/reducer).
>> >
>> > I wanted to confirm the same and wrote the following piece of code in
>> the
>> > configure method of mapper.
>> >
>> > @Override
>> >
>> > public void configure(JobConf job) {
>> >
>> > System.out.println("FREE MEMORY -- "
>> >
>> > + Runtime.getRuntime().freeMemory());
>> >
>> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>> >
>> > }
>> >
>> >
>> > Surprisingly the output was
>> >
>> >
>> > FREE MEMORY -- 341854864  = 320 MB
>> > MAX MEMORY ---1908932608  = 1.9 GB
>> >
>> >
>> > I am just wondering what processes are taking up that extra 1.6GB of
>> heap
>> > which I configured for the child jvm heap.
>> >
>> >
>> > Appreciate in helping me understand the scenario.
>> >
>> >
>> >
>> > Regards
>> >
>> > Nagarjuna K
>> >
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Sent from iPhone
>

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
io.sort.mb = 256 MB

On Monday, March 25, 2013, Harsh J wrote:

> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <nagarjuna.kanamarlapudi@gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
>
>
>
> --
> Harsh J
>


-- 
Sent from iPhone

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
io.sort.mb = 256 MB

On Monday, March 25, 2013, Harsh J wrote:

> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <nagarjuna.kanamarlapudi@gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
>
>
>
> --
> Harsh J
>


-- 
Sent from iPhone

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
io.sort.mb = 256 MB

On Monday, March 25, 2013, Harsh J wrote:

> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <nagarjuna.kanamarlapudi@gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
>
>
>
> --
> Harsh J
>


-- 
Sent from iPhone

Re: Child JVM memory allocation / Usage

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.
io.sort.mb = 256 MB

On Monday, March 25, 2013, Harsh J wrote:

> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <nagarjuna.kanamarlapudi@gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
>
>
>
> --
> Harsh J
>


-- 
Sent from iPhone

Re: Child JVM memory allocation / Usage

Posted by Harsh J <ha...@cloudera.com>.
The MapTask may consume some memory of its own as well. What is your
io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
<na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of heap
> which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>
>
>



-- 
Harsh J

Re: Child JVM memory allocation / Usage

Posted by Harsh J <ha...@cloudera.com>.
The MapTask may consume some memory of its own as well. What is your
io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
<na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of heap
> which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>
>
>



-- 
Harsh J

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
did you set the min heap size == your max head size? if you didn't,
free memory only shows you the difference between used and commit, not
used and max.

On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of
> heap which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by Harsh J <ha...@cloudera.com>.
The MapTask may consume some memory of its own as well. What is your
io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
<na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of heap
> which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>
>
>



-- 
Harsh J

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
did you set the min heap size == your max head size? if you didn't,
free memory only shows you the difference between used and commit, not
used and max.

On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of
> heap which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>


-- 
Ted.

Re: Child JVM memory allocation / Usage

Posted by Harsh J <ha...@cloudera.com>.
The MapTask may consume some memory of its own as well. What is your
io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
<na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of heap
> which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>
>
>



-- 
Harsh J

Re: Child JVM memory allocation / Usage

Posted by Ted <r6...@gmail.com>.
did you set the min heap size == your max head size? if you didn't,
free memory only shows you the difference between used and commit, not
used and max.

On 3/24/13, nagarjuna kanamarlapudi <na...@gmail.com> wrote:
> Hi,
>
> I configured  my child jvm heap to 2 GB. So, I thought I could really read
> 1.5GB of data and store it in memory (mapper/reducer).
>
> I wanted to confirm the same and wrote the following piece of code in the
> configure method of mapper.
>
> @Override
>
> public void configure(JobConf job) {
>
> System.out.println("FREE MEMORY -- "
>
> + Runtime.getRuntime().freeMemory());
>
> System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
>
> }
>
>
> Surprisingly the output was
>
>
> FREE MEMORY -- 341854864  = 320 MB
> MAX MEMORY ---1908932608  = 1.9 GB
>
>
> I am just wondering what processes are taking up that extra 1.6GB of
> heap which I configured for the child jvm heap.
>
>
> Appreciate in helping me understand the scenario.
>
>
>
> Regards
>
> Nagarjuna K
>


-- 
Ted.