You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shi Yu <sh...@uchicago.edu> on 2010/10/11 22:50:03 UTC

load a serialized object in hadoop

Hi,

I want to load a serialized HashMap object in hadoop. The file of stored 
object is 200M. I could read that object efficiently in JAVA by setting 
-Xmx as 1000M.  However, in hadoop I could never load it into memory. 
The code is very simple (just read the ObjectInputStream) and there is 
yet no map/reduce implemented.  I set the  
mapred.child.java.opts=-Xmx3000M, still get the 
"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a 
little bit how memory is allocate to JVM in hadoop. Why hadoop takes up 
so much memory?  If a program requires 1G memory on a single node, how 
much memory it requires (generally) in Hadoop?

Thanks.

Shi

--

Re: load a serialized object in hadoop

Posted by Konstantin Boudnik <co...@boudnik.org>.

You should have no space here "-D HADOOP_CLIENT_OPTS"

On Wed, Oct 13, 2010 at 04:21PM, Shi Yu wrote:
> Hi,  thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
> 
> still no effect. Or this is a system variable? Should I export it?
> How to configure it?
> 
> Shi
> 
>  java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
> 
> 
> On 2010-10-13 15:28, Luke Lu wrote:
> >On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>  wrote:
> >>I haven't implemented anything in map/reduce yet for this issue. I just try
> >>to invoke the same java class using   bin/hadoop  command.  The thing is a
> >>very simple program could be executed in Java, but not doable in bin/hadoop
> >>command.
> >If you are just trying to use bin/hadoop jar your.jar command, your
> >code runs in a local client jvm and mapred.child.java.opts has no
> >effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
> >jar your.jar
> >
> >>I think if I couldn't get through the first stage, even I had a
> >>map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
> >>
> >>Best Regards,
> >>
> >>Shi
> >>
> >>On 2010-10-13 14:15, Luke Lu wrote:
> >>>Can you post your mapper/reducer implementation? or are you using
> >>>hadoop streaming? for which mapred.child.java.opts doesn't apply to
> >>>the jvm you care about. BTW, what's the hadoop version you're using?
> >>>
> >>>On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>    wrote:
> >>>
> >>>>Here is my code. There is no Map/Reduce in it. I could run this code
> >>>>using
> >>>>java -Xmx1000m ,  however, when using  bin/hadoop  -D
> >>>>mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
> >>>>have tried other program in Hadoop with the same settings so the memory
> >>>>is
> >>>>available in my machines.
> >>>>
> >>>>
> >>>>public static void main(String[] args) {
> >>>>   try{
> >>>>             String myFile = "xxx.dat";
> >>>>             FileInputStream fin = new FileInputStream(myFile);
> >>>>             ois = new ObjectInputStream(fin);
> >>>>             margintagMap = ois.readObject();
> >>>>             ois.close();
> >>>>             fin.close();
> >>>>     }catch(Exception e){
> >>>>         //
> >>>>    }
> >>>>}
> >>>>
> >>>>On 2010-10-13 13:30, Luke Lu wrote:
> >>>>
> >>>>>On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>      wrote:
> >>>>>
> >>>>>
> >>>>>>As a coming-up to the my own question, I think to invoke the JVM in
> >>>>>>Hadoop
> >>>>>>requires much more memory than an ordinary JVM.
> >>>>>>
> >>>>>>
> >>>>>That's simply not true. The default mapreduce task Xmx is 200M, which
> >>>>>is much smaller than the standard jvm default 512M and most users
> >>>>>don't need to increase it. Please post the code reading the object (in
> >>>>>hdfs?) in your tasks.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>I found that instead of
> >>>>>>serialization the object, maybe I could create a MapFile as an index to
> >>>>>>permit lookups by key in Hadoop. I have also compared the performance
> >>>>>>of
> >>>>>>MongoDB and Memcache. I will let you know the result after I try the
> >>>>>>MapFile
> >>>>>>approach.
> >>>>>>
> >>>>>>Shi
> >>>>>>
> >>>>>>On 2010-10-12 21:59, M. C. Srivas wrote:
> >>>>>>
> >>>>>>
> >>>>>>>>On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
> >>>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>Hi,
> >>>>>>>>>
> >>>>>>>>>I want to load a serialized HashMap object in hadoop. The file of
> >>>>>>>>>stored
> >>>>>>>>>object is 200M. I could read that object efficiently in JAVA by
> >>>>>>>>>setting
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>-Xmx
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>as 1000M.  However, in hadoop I could never load it into memory. The
> >>>>>>>>>code
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>is
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>very simple (just read the ObjectInputStream) and there is yet no
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>map/reduce
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
> >>>>>>>>>the
> >>>>>>>>>"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain
> >>>>>>>>>a
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>little
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
> >>>>>>>>>much
> >>>>>>>>>memory?  If a program requires 1G memory on a single node, how much
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>memory
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>it requires (generally) in Hadoop?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>The JVM reserves swap space in advance, at the time of launching the
> >>>>>>>process. If your swap is too low (or do not have any swap configured),
> >>>>>>>you
> >>>>>>>will hit this.
> >>>>>>>
> >>>>>>>Or, you are on a 32-bit machine, in which case 3G is not possible in
> >>>>>>>the
> >>>>>>>JVM.
> >>>>>>>
> >>>>>>>-Srivas.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>Thanks.
> >>>>>>>>>
> >>>>>>>>>Shi
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>--
> >>>>Postdoctoral Scholar
> >>>>Institute for Genomics and Systems Biology
> >>>>Department of Medicine, the University of Chicago
> >>>>Knapp Center for Biomedical Discovery
> >>>>900 E. 57th St. Room 10148
> >>>>Chicago, IL 60637, US
> >>>>Tel: 773-702-6799
> >>>>
> >>>>
> >>>>
> >>
> >>--
> >>Postdoctoral Scholar
> >>Institute for Genomics and Systems Biology
> >>Department of Medicine, the University of Chicago
> >>Knapp Center for Biomedical Discovery
> >>900 E. 57th St. Room 10148
> >>Chicago, IL 60637, US
> >>Tel: 773-702-6799
> >>
> >>
> 
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Just a remind and a warning, changing the

HADOOP_CLIENT_OPTS
HADOOP_OPTS

items in the hadoop-env.sh  improperly may cause Hadoop to crash.

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Thanks. Well I set the value to 3000M in hadoop-env.sh so it has the 
same configuration as Java

export HADOOP_HEAPSIZE=3000
export HADOOP_CLIENT_OPTS=3000

Then I did the comparison:

sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar 
OOloadtest
timing (hms): 0 hour(s) 2 minute(s) 53 second(s) 599millisecond(s)

sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ java -Xms3G -Xmx3G -classpath 
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
timing (hms): 0 hour(s) 1 minute(s) 14 second(s) 7millisecond(s)

It seems the hadoop command is 50% slower. I guess that was because I 
didn't set the "initial heap size" correctly (correspond to the -Xms 
configuration in Java). I tried like this

sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar 
OOloadtest  -D mapred.child.java.opts=-Xmx3000M
timing (hms): 0 hour(s) 3 minute(s) 0 second(s) 343millisecond(s)

I also tried

HADOOP_OPTS=-Xmx3000M bin/hadoop jar WordCount.jar OOloadtest
timing (hms): 0 hour(s) 3 minute(s) 7 second(s) 774millisecond(s)

Now it works!

So how to set the initial heap size?   HADOOP_NAMENODE_OPTS  or 
HADOOP_CLIENT_OPTS ? Because there is 50% difference in speed.

Shi

On 2010-10-13 19:04, Luke Lu wrote:
> Just took a look at the bin/hadoop of your particular version
> (http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup).
> It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar
> command, which is fixed in later version.
>
> So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work
> because it just translates to the same java command line that worked
> for you :)
>
> __Luke
>
> On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>    
>> Hi, I tried the following five ways:
>>
>> Approach 1: in command line
>> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>>
>>
>> Approach 2: I added the hadoop-site.xml file with the following element.
>> Each time I changed, I stop and restart hadoop on all the nodes.
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>-Xmx4000m</value>
>> </property>
>>
>> run the command
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> Approach 3: I changed like this
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>4000m</value>
>> </property>
>> ....
>>
>> Then run the command:
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> Approach 4: To make sure, I changed the "m" to numbers, that was
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>4000000000</value>
>> </property>
>> ....
>>
>> Then run the command:
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> All these four approaches come to the same "Java heap space" error.
>>
>> java.lang.OutOfMemoryError: Java heap space
>>         at
>> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
>>         at java.lang.StringBuilder.<init>(StringBuilder.java:68)
>>         at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
>>         at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
>>         at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>>         at java.util.HashMap.readObject(HashMap.java:1028)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>>         at ObjectManager.loadObject(ObjectManager.java:42)
>>         at OOloadtest.main(OOloadtest.java:21)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>>
>> Approach 5:
>> In comparison, I called the Java command directly as follows (there is a
>> counter showing how much time it costs if the serialized object is
>> successfully loaded):
>>
>> $java -Xms3G -Xmx3G -classpath
>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>>
>> return:
>> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
>> 162millisecond(s)
>>
>>
>> What was the problem in my command? Where can I find the documentation about
>> HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
>>
>> Shi
>>
>>
>> On 2010-10-13 16:28, Luke Lu wrote:
>>      
>>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>
>>>        
>>>> Hi,  thanks for the advice. I tried with your settings,
>>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>>> still no effect. Or this is a system variable? Should I export it? How to
>>>> configure it?
>>>>
>>>>          
>>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>>
>>> if you use sh derivative shells (bash, ksh etc.) prepend env for other
>>> shells.
>>>
>>> __Luke
>>>
>>>
>>>
>>>        
>>>> Shi
>>>>
>>>>   java -Xms3G -Xmx3G -classpath
>>>>
>>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>>>> OOloadtest
>>>>
>>>>
>>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>>
>>>>          
>>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>      wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>>>> try
>>>>>> to invoke the same java class using   bin/hadoop  command.  The thing
>>>>>> is
>>>>>> a
>>>>>> very simple program could be executed in Java, but not doable in
>>>>>> bin/hadoop
>>>>>> command.
>>>>>>
>>>>>>
>>>>>>              
>>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>>> jar your.jar
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> I think if I couldn't get through the first stage, even I had a
>>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2.
>>>>>> Thanks.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>>
>>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>>>> using
>>>>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough
>>>>>>>> error.
>>>>>>>>   I
>>>>>>>> have tried other program in Hadoop with the same settings so the
>>>>>>>> memory
>>>>>>>> is
>>>>>>>> available in my machines.
>>>>>>>>
>>>>>>>>
>>>>>>>> public static void main(String[] args) {
>>>>>>>>    try{
>>>>>>>>              String myFile = "xxx.dat";
>>>>>>>>              FileInputStream fin = new FileInputStream(myFile);
>>>>>>>>              ois = new ObjectInputStream(fin);
>>>>>>>>              margintagMap = ois.readObject();
>>>>>>>>              ois.close();
>>>>>>>>              fin.close();
>>>>>>>>      }catch(Exception e){
>>>>>>>>          //
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>>>> Hadoop
>>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M,
>>>>>>>>> which
>>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>>> don't need to increase it. Please post the code reading the object
>>>>>>>>> (in
>>>>>>>>> hdfs?) in your tasks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> I found that instead of
>>>>>>>>>> serialization the object, maybe I could create a MapFile as an
>>>>>>>>>> index
>>>>>>>>>> to
>>>>>>>>>> permit lookups by key in Hadoop. I have also compared the
>>>>>>>>>> performance
>>>>>>>>>> of
>>>>>>>>>> MongoDB and Memcache. I will let you know the result after I try
>>>>>>>>>> the
>>>>>>>>>> MapFile
>>>>>>>>>> approach.
>>>>>>>>>>
>>>>>>>>>> Shi
>>>>>>>>>>
>>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file
>>>>>>>>>>>>> of
>>>>>>>>>>>>> stored
>>>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>>>> setting
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> code
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet
>>>>>>>>>>>>> no
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> map/reduce
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>>>> get
>>>>>>>>>>>>> the
>>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>>>>> explain
>>>>>>>>>>>>> a
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> little
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>>>> so
>>>>>>>>>>>>> much
>>>>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>>>>> much
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> memory
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>> The JVM reserves swap space in advance, at the time of launching
>>>>>>>>>>> the
>>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>>> configured),
>>>>>>>>>>> you
>>>>>>>>>>> will hit this.
>>>>>>>>>>>
>>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> JVM.
>>>>>>>>>>>
>>>>>>>>>>> -Srivas.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shi
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>> --
>>>>>>>> Postdoctoral Scholar
>>>>>>>> Institute for Genomics and Systems Biology
>>>>>>>> Department of Medicine, the University of Chicago
>>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>>> 900 E. 57th St. Room 10148
>>>>>>>> Chicago, IL 60637, US
>>>>>>>> Tel: 773-702-6799
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>> --
>>>>>> Postdoctoral Scholar
>>>>>> Institute for Genomics and Systems Biology
>>>>>> Department of Medicine, the University of Chicago
>>>>>> Knapp Center for Biomedical Discovery
>>>>>> 900 E. 57th St. Room 10148
>>>>>> Chicago, IL 60637, US
>>>>>> Tel: 773-702-6799
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>
>>>>
>>>>          
>>
>>
>>

Re: load a serialized object in hadoop

Posted by Luke Lu <ll...@vicaya.com>.

Just took a look at the bin/hadoop of your particular version
(http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup).
It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar
command, which is fixed in later version.

So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work
because it just translates to the same java command line that worked
for you :)

__Luke

On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu <sh...@uchicago.edu> wrote:
> Hi, I tried the following five ways:
>
> Approach 1: in command line
> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>
>
> Approach 2: I added the hadoop-site.xml file with the following element.
> Each time I changed, I stop and restart hadoop on all the nodes.
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>-Xmx4000m</value>
> </property>
>
> run the command
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 3: I changed like this
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000m</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 4: To make sure, I changed the "m" to numbers, that was
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000000000</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> All these four approaches come to the same "Java heap space" error.
>
> java.lang.OutOfMemoryError: Java heap space
>        at
> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
>        at java.lang.StringBuilder.<init>(StringBuilder.java:68)
>        at
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
>        at
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
>        at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
>        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>        at java.util.HashMap.readObject(HashMap.java:1028)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
>        at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
>        at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>        at ObjectManager.loadObject(ObjectManager.java:42)
>        at OOloadtest.main(OOloadtest.java:21)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
>
> Approach 5:
> In comparison, I called the Java command directly as follows (there is a
> counter showing how much time it costs if the serialized object is
> successfully loaded):
>
> $java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>
> return:
> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
> 162millisecond(s)
>
>
> What was the problem in my command? Where can I find the documentation about
> HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
>
> Shi
>
>
> On 2010-10-13 16:28, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>>
>>>
>>> Hi,  thanks for the advice. I tried with your settings,
>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>> still no effect. Or this is a system variable? Should I export it? How to
>>> configure it?
>>>
>>
>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>
>> if you use sh derivative shells (bash, ksh etc.) prepend env for other
>> shells.
>>
>> __Luke
>>
>>
>>
>>>
>>> Shi
>>>
>>>  java -Xms3G -Xmx3G -classpath
>>>
>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>>> OOloadtest
>>>
>>>
>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>
>>>>
>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>
>>>>
>>>>>
>>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>>> try
>>>>> to invoke the same java class using   bin/hadoop  command.  The thing
>>>>> is
>>>>> a
>>>>> very simple program could be executed in Java, but not doable in
>>>>> bin/hadoop
>>>>> command.
>>>>>
>>>>>
>>>>
>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>> jar your.jar
>>>>
>>>>
>>>>
>>>>>
>>>>> I think if I couldn't get through the first stage, even I had a
>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2.
>>>>> Thanks.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>>> using
>>>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough
>>>>>>> error.
>>>>>>>  I
>>>>>>> have tried other program in Hadoop with the same settings so the
>>>>>>> memory
>>>>>>> is
>>>>>>> available in my machines.
>>>>>>>
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>>   try{
>>>>>>>             String myFile = "xxx.dat";
>>>>>>>             FileInputStream fin = new FileInputStream(myFile);
>>>>>>>             ois = new ObjectInputStream(fin);
>>>>>>>             margintagMap = ois.readObject();
>>>>>>>             ois.close();
>>>>>>>             fin.close();
>>>>>>>     }catch(Exception e){
>>>>>>>         //
>>>>>>>    }
>>>>>>> }
>>>>>>>
>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>>> Hadoop
>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M,
>>>>>>>> which
>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>> don't need to increase it. Please post the code reading the object
>>>>>>>> (in
>>>>>>>> hdfs?) in your tasks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I found that instead of
>>>>>>>>> serialization the object, maybe I could create a MapFile as an
>>>>>>>>> index
>>>>>>>>> to
>>>>>>>>> permit lookups by key in Hadoop. I have also compared the
>>>>>>>>> performance
>>>>>>>>> of
>>>>>>>>> MongoDB and Memcache. I will let you know the result after I try
>>>>>>>>> the
>>>>>>>>> MapFile
>>>>>>>>> approach.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file
>>>>>>>>>>>> of
>>>>>>>>>>>> stored
>>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>>> setting
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Xmx
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory.
>>>>>>>>>>>> The
>>>>>>>>>>>> code
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet
>>>>>>>>>>>> no
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> map/reduce
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>>>> explain
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> little
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>>> so
>>>>>>>>>>>> much
>>>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>>>> much
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> memory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The JVM reserves swap space in advance, at the time of launching
>>>>>>>>>> the
>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>> configured),
>>>>>>>>>> you
>>>>>>>>>> will hit this.
>>>>>>>>>>
>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible
>>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>> JVM.
>>>>>>>>>>
>>>>>>>>>> -Srivas.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Shi
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Postdoctoral Scholar
>>>>>>> Institute for Genomics and Systems Biology
>>>>>>> Department of Medicine, the University of Chicago
>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>> 900 E. 57th St. Room 10148
>>>>>>> Chicago, IL 60637, US
>>>>>>> Tel: 773-702-6799
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Hi, I got it, it should be declared in the
enhadoop-env.sh

export HADOOP_CLIENT_OPTS=-Xmx4000m

Thanks! At the same time I see corrections come in.

Shi

On 2010-10-13 18:18, Shi Yu wrote:
> Hi, I tried the following five ways:
>
> Approach 1: in command line
> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>
>
> Approach 2: I added the hadoop-site.xml file with the following 
> element. Each time I changed, I stop and restart hadoop on all the nodes.
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>-Xmx4000m</value>
> </property>
>
> run the command
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 3: I changed like this
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000m</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 4: To make sure, I changed the "m" to numbers, that was
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000000000</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> All these four approaches come to the same "Java heap space" error.
>
> java.lang.OutOfMemoryError: Java heap space
>         at 
> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
>         at java.lang.StringBuilder.<init>(StringBuilder.java:68)
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997) 
>
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818) 
>
>         at 
> java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
>         at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
>         at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>         at java.util.HashMap.readObject(HashMap.java:1028)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
>         at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
>         at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
>         at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>         at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>         at ObjectManager.loadObject(ObjectManager.java:42)
>         at OOloadtest.main(OOloadtest.java:21)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
>
> Approach 5:
> In comparison, I called the Java command directly as follows (there is 
> a counter showing how much time it costs if the serialized object is 
> successfully loaded):
>
> $java -Xms3G -Xmx3G -classpath 
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>
> return:
> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s) 
> 162millisecond(s)
>
>
> What was the problem in my command? Where can I find the documentation 
> about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it 
> works?
>
> Shi
>
>
> On 2010-10-13 16:28, Luke Lu wrote:
>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>>> Hi,  thanks for the advice. I tried with your settings,
>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>> still no effect. Or this is a system variable? Should I export it? 
>>> How to
>>> configure it?
>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>
>> if you use sh derivative shells (bash, ksh etc.) prepend env for 
>> other shells.
>>
>> __Luke
>>
>>
>>> Shi
>>>
>>>   java -Xms3G -Xmx3G -classpath
>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar 
>>>
>>> OOloadtest
>>>
>>>
>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>
>>>>> I haven't implemented anything in map/reduce yet for this issue. I 
>>>>> just
>>>>> try
>>>>> to invoke the same java class using   bin/hadoop  command.  The 
>>>>> thing is
>>>>> a
>>>>> very simple program could be executed in Java, but not doable in
>>>>> bin/hadoop
>>>>> command.
>>>>>
>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>> jar your.jar
>>>>
>>>>
>>>>> I think if I couldn't get through the first stage, even I had a
>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. 
>>>>> Thanks.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>
>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>      
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this 
>>>>>>> code
>>>>>>> using
>>>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough 
>>>>>>> error.
>>>>>>>   I
>>>>>>> have tried other program in Hadoop with the same settings so the 
>>>>>>> memory
>>>>>>> is
>>>>>>> available in my machines.
>>>>>>>
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>>    try{
>>>>>>>              String myFile = "xxx.dat";
>>>>>>>              FileInputStream fin = new FileInputStream(myFile);
>>>>>>>              ois = new ObjectInputStream(fin);
>>>>>>>              margintagMap = ois.readObject();
>>>>>>>              ois.close();
>>>>>>>              fin.close();
>>>>>>>      }catch(Exception e){
>>>>>>>          //
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>   wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> As a coming-up to the my own question, I think to invoke the 
>>>>>>>>> JVM in
>>>>>>>>> Hadoop
>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, 
>>>>>>>> which
>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>> don't need to increase it. Please post the code reading the 
>>>>>>>> object (in
>>>>>>>> hdfs?) in your tasks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I found that instead of
>>>>>>>>> serialization the object, maybe I could create a MapFile as an 
>>>>>>>>> index
>>>>>>>>> to
>>>>>>>>> permit lookups by key in Hadoop. I have also compared the 
>>>>>>>>> performance
>>>>>>>>> of
>>>>>>>>> MongoDB and Memcache. I will let you know the result after I 
>>>>>>>>> try the
>>>>>>>>> MapFile
>>>>>>>>> approach.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The 
>>>>>>>>>>>> file of
>>>>>>>>>>>> stored
>>>>>>>>>>>> object is 200M. I could read that object efficiently in 
>>>>>>>>>>>> JAVA by
>>>>>>>>>>>> setting
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> -Xmx
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into 
>>>>>>>>>>>> memory.
>>>>>>>>>>>> The
>>>>>>>>>>>> code
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is 
>>>>>>>>>>>> yet no
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> map/reduce
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, 
>>>>>>>>>>>> still
>>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>>>> explain
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> little
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop 
>>>>>>>>>>>> takes up
>>>>>>>>>>>> so
>>>>>>>>>>>> much
>>>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>>>> much
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> memory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> The JVM reserves swap space in advance, at the time of 
>>>>>>>>>> launching the
>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>> configured),
>>>>>>>>>> you
>>>>>>>>>> will hit this.
>>>>>>>>>>
>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not 
>>>>>>>>>> possible in
>>>>>>>>>> the
>>>>>>>>>> JVM.
>>>>>>>>>>
>>>>>>>>>> -Srivas.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Shi
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> -- 
>>>>>>> Postdoctoral Scholar
>>>>>>> Institute for Genomics and Systems Biology
>>>>>>> Department of Medicine, the University of Chicago
>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>> 900 E. 57th St. Room 10148
>>>>>>> Chicago, IL 60637, US
>>>>>>> Tel: 773-702-6799
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> -- 
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Hi, I tried the following five ways:

Approach 1: in command line
HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest


Approach 2: I added the hadoop-site.xml file with the following element. 
Each time I changed, I stop and restart hadoop on all the nodes.
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>-Xmx4000m</value>
</property>

run the command
$bin/hadoop jar WordCount.jar OOloadtest

Approach 3: I changed like this
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000m</value>
</property>
....

Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest

Approach 4: To make sure, I changed the "m" to numbers, that was
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000000000</value>
</property>
....

Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest

All these four approaches come to the same "Java heap space" error.

java.lang.OutOfMemoryError: Java heap space
         at 
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
         at java.lang.StringBuilder.<init>(StringBuilder.java:68)
         at 
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
         at 
java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
         at 
java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
         at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
         at java.util.HashMap.readObject(HashMap.java:1028)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
         at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
         at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
         at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
         at ObjectManager.loadObject(ObjectManager.java:42)
         at OOloadtest.main(OOloadtest.java:21)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


Approach 5:
In comparison, I called the Java command directly as follows (there is a 
counter showing how much time it costs if the serialized object is 
successfully loaded):

$java -Xms3G -Xmx3G -classpath 
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest

return:
object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s) 
162millisecond(s)


What was the problem in my command? Where can I find the documentation 
about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?

Shi


On 2010-10-13 16:28, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>    
>> Hi,  thanks for the advice. I tried with your settings,
>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>> still no effect. Or this is a system variable? Should I export it? How to
>> configure it?
>>      
> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>
> if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.
>
> __Luke
>
>
>    
>> Shi
>>
>>   java -Xms3G -Xmx3G -classpath
>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>> OOloadtest
>>
>>
>> On 2010-10-13 15:28, Luke Lu wrote:
>>      
>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>
>>>        
>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>> try
>>>> to invoke the same java class using   bin/hadoop  command.  The thing is
>>>> a
>>>> very simple program could be executed in Java, but not doable in
>>>> bin/hadoop
>>>> command.
>>>>
>>>>          
>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>> code runs in a local client jvm and mapred.child.java.opts has no
>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>> jar your.jar
>>>
>>>
>>>        
>>>> I think if I couldn't get through the first stage, even I had a
>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>>
>>>> Best Regards,
>>>>
>>>> Shi
>>>>
>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>
>>>>          
>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>
>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>      wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>> using
>>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.
>>>>>>   I
>>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>>> is
>>>>>> available in my machines.
>>>>>>
>>>>>>
>>>>>> public static void main(String[] args) {
>>>>>>    try{
>>>>>>              String myFile = "xxx.dat";
>>>>>>              FileInputStream fin = new FileInputStream(myFile);
>>>>>>              ois = new ObjectInputStream(fin);
>>>>>>              margintagMap = ois.readObject();
>>>>>>              ois.close();
>>>>>>              fin.close();
>>>>>>      }catch(Exception e){
>>>>>>          //
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>> Hadoop
>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>>> hdfs?) in your tasks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> I found that instead of
>>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>>> to
>>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>>> of
>>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>>> MapFile
>>>>>>>> approach.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>   wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>>> stored
>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>> setting
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> -Xmx
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory.
>>>>>>>>>>> The
>>>>>>>>>>> code
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> map/reduce
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>> get
>>>>>>>>>>> the
>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>>> explain
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> little
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>> so
>>>>>>>>>>> much
>>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>>> much
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> memory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>> configured),
>>>>>>>>> you
>>>>>>>>> will hit this.
>>>>>>>>>
>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>>> the
>>>>>>>>> JVM.
>>>>>>>>>
>>>>>>>>> -Srivas.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Shi
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>> --
>>>>>> Postdoctoral Scholar
>>>>>> Institute for Genomics and Systems Biology
>>>>>> Department of Medicine, the University of Chicago
>>>>>> Knapp Center for Biomedical Discovery
>>>>>> 900 E. 57th St. Room 10148
>>>>>> Chicago, IL 60637, US
>>>>>> Tel: 773-702-6799
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>> --
>>>> Postdoctoral Scholar
>>>> Institute for Genomics and Systems Biology
>>>> Department of Medicine, the University of Chicago
>>>> Knapp Center for Biomedical Discovery
>>>> 900 E. 57th St. Room 10148
>>>> Chicago, IL 60637, US
>>>> Tel: 773-702-6799
>>>>
>>>>
>>>>
>>>>          
>>
>>
>>

Re: load a serialized object in hadoop

Posted by Luke Lu <ll...@vicaya.com>.

On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu <sh...@uchicago.edu> wrote:
> Hi,  thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
> still no effect. Or this is a system variable? Should I export it? How to
> configure it?

HADOOP_CLIENT_OPTS is an environment variable so you should run it as
HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest

if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.

__Luke


> Shi
>
>  java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
>
>
> On 2010-10-13 15:28, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>>
>>>
>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>> try
>>> to invoke the same java class using   bin/hadoop  command.  The thing is
>>> a
>>> very simple program could be executed in Java, but not doable in
>>> bin/hadoop
>>> command.
>>>
>>
>> If you are just trying to use bin/hadoop jar your.jar command, your
>> code runs in a local client jvm and mapred.child.java.opts has no
>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>> jar your.jar
>>
>>
>>>
>>> I think if I couldn't get through the first stage, even I had a
>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>
>>> Best Regards,
>>>
>>> Shi
>>>
>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>
>>>>
>>>> Can you post your mapper/reducer implementation? or are you using
>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>
>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>
>>>>
>>>>>
>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>> using
>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.
>>>>>  I
>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>> is
>>>>> available in my machines.
>>>>>
>>>>>
>>>>> public static void main(String[] args) {
>>>>>   try{
>>>>>             String myFile = "xxx.dat";
>>>>>             FileInputStream fin = new FileInputStream(myFile);
>>>>>             ois = new ObjectInputStream(fin);
>>>>>             margintagMap = ois.readObject();
>>>>>             ois.close();
>>>>>             fin.close();
>>>>>     }catch(Exception e){
>>>>>         //
>>>>>    }
>>>>> }
>>>>>
>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>> Hadoop
>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>> hdfs?) in your tasks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I found that instead of
>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>> to
>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>> of
>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>> MapFile
>>>>>>> approach.
>>>>>>>
>>>>>>> Shi
>>>>>>>
>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>> stored
>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>> setting
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Xmx
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory.
>>>>>>>>>> The
>>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> map/reduce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>> get
>>>>>>>>>> the
>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>> explain
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> little
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>> so
>>>>>>>>>> much
>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>> much
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>> configured),
>>>>>>>> you
>>>>>>>> will hit this.
>>>>>>>>
>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>> the
>>>>>>>> JVM.
>>>>>>>>
>>>>>>>> -Srivas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Shi
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Hi,  thanks for the advice. I tried with your settings,
$ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m

still no effect. Or this is a system variable? Should I export it? How 
to configure it?

Shi

  java -Xms3G -Xmx3G -classpath 
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar 
OOloadtest


On 2010-10-13 15:28, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>  wrote:
>    
>> I haven't implemented anything in map/reduce yet for this issue. I just try
>> to invoke the same java class using   bin/hadoop  command.  The thing is a
>> very simple program could be executed in Java, but not doable in bin/hadoop
>> command.
>>      
> If you are just trying to use bin/hadoop jar your.jar command, your
> code runs in a local client jvm and mapred.child.java.opts has no
> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
> jar your.jar
>
>    
>> I think if I couldn't get through the first stage, even I had a
>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>
>> Best Regards,
>>
>> Shi
>>
>> On 2010-10-13 14:15, Luke Lu wrote:
>>      
>>> Can you post your mapper/reducer implementation? or are you using
>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>
>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>
>>>        
>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>> using
>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
>>>> have tried other program in Hadoop with the same settings so the memory
>>>> is
>>>> available in my machines.
>>>>
>>>>
>>>> public static void main(String[] args) {
>>>>    try{
>>>>              String myFile = "xxx.dat";
>>>>              FileInputStream fin = new FileInputStream(myFile);
>>>>              ois = new ObjectInputStream(fin);
>>>>              margintagMap = ois.readObject();
>>>>              ois.close();
>>>>              fin.close();
>>>>      }catch(Exception e){
>>>>          //
>>>>     }
>>>> }
>>>>
>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>
>>>>          
>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>      wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>> Hadoop
>>>>>> requires much more memory than an ordinary JVM.
>>>>>>
>>>>>>
>>>>>>              
>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>> is much smaller than the standard jvm default 512M and most users
>>>>> don't need to increase it. Please post the code reading the object (in
>>>>> hdfs?) in your tasks.
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> I found that instead of
>>>>>> serialization the object, maybe I could create a MapFile as an index to
>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>> of
>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>> MapFile
>>>>>> approach.
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>   wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>> stored
>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>> setting
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> -Xmx
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>>>>>> code
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> is
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> map/reduce
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
>>>>>>>>> the
>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain
>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> little
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>>>> much
>>>>>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> memory
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>>                  
>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>>>> you
>>>>>>> will hit this.
>>>>>>>
>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>> the
>>>>>>> JVM.
>>>>>>>
>>>>>>> -Srivas.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>>                  
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>
>>>>>>              
>>>> --
>>>> Postdoctoral Scholar
>>>> Institute for Genomics and Systems Biology
>>>> Department of Medicine, the University of Chicago
>>>> Knapp Center for Biomedical Discovery
>>>> 900 E. 57th St. Room 10148
>>>> Chicago, IL 60637, US
>>>> Tel: 773-702-6799
>>>>
>>>>
>>>>
>>>>          
>>
>> --
>> Postdoctoral Scholar
>> Institute for Genomics and Systems Biology
>> Department of Medicine, the University of Chicago
>> Knapp Center for Biomedical Discovery
>> 900 E. 57th St. Room 10148
>> Chicago, IL 60637, US
>> Tel: 773-702-6799
>>
>>
>>

Re: load a serialized object in hadoop

Posted by Luke Lu <ll...@vicaya.com>.

On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu <sh...@uchicago.edu> wrote:
> I haven't implemented anything in map/reduce yet for this issue. I just try
> to invoke the same java class using   bin/hadoop  command.  The thing is a
> very simple program could be executed in Java, but not doable in bin/hadoop
> command.

If you are just trying to use bin/hadoop jar your.jar command, your
code runs in a local client jvm and mapred.child.java.opts has no
effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
jar your.jar

> I think if I couldn't get through the first stage, even I had a
> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>
> Best Regards,
>
> Shi
>
> On 2010-10-13 14:15, Luke Lu wrote:
>>
>> Can you post your mapper/reducer implementation? or are you using
>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>> the jvm you care about. BTW, what's the hadoop version you're using?
>>
>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>>
>>>
>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>> using
>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
>>> have tried other program in Hadoop with the same settings so the memory
>>> is
>>> available in my machines.
>>>
>>>
>>> public static void main(String[] args) {
>>>   try{
>>>             String myFile = "xxx.dat";
>>>             FileInputStream fin = new FileInputStream(myFile);
>>>             ois = new ObjectInputStream(fin);
>>>             margintagMap = ois.readObject();
>>>             ois.close();
>>>             fin.close();
>>>     }catch(Exception e){
>>>         //
>>>    }
>>> }
>>>
>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>
>>>>
>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>
>>>>
>>>>>
>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>> Hadoop
>>>>> requires much more memory than an ordinary JVM.
>>>>>
>>>>>
>>>>
>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>> is much smaller than the standard jvm default 512M and most users
>>>> don't need to increase it. Please post the code reading the object (in
>>>> hdfs?) in your tasks.
>>>>
>>>>
>>>>
>>>>>
>>>>> I found that instead of
>>>>> serialization the object, maybe I could create a MapFile as an index to
>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>> of
>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>> MapFile
>>>>> approach.
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>
>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>> stored
>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>> setting
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -Xmx
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>>>>> code
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> map/reduce
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
>>>>>>>> the
>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain
>>>>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> little
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>>> much
>>>>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> memory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>>> you
>>>>>> will hit this.
>>>>>>
>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>> the
>>>>>> JVM.
>>>>>>
>>>>>> -Srivas.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
> --
> Postdoctoral Scholar
> Institute for Genomics and Systems Biology
> Department of Medicine, the University of Chicago
> Knapp Center for Biomedical Discovery
> 900 E. 57th St. Room 10148
> Chicago, IL 60637, US
> Tel: 773-702-6799
>
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

I haven't implemented anything in map/reduce yet for this issue. I just 
try to invoke the same java class using   bin/hadoop  command.  The 
thing is a very simple program could be executed in Java, but not doable 
in bin/hadoop command. I think if I couldn't get through the first 
stage, even I had a map/reduce program it would also fail. I am using 
Hadoop 0.19.2. Thanks.

Best Regards,

Shi

On 2010-10-13 14:15, Luke Lu wrote:
> Can you post your mapper/reducer implementation? or are you using
> hadoop streaming? for which mapred.child.java.opts doesn't apply to
> the jvm you care about. BTW, what's the hadoop version you're using?
>
> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>    
>> Here is my code. There is no Map/Reduce in it. I could run this code using
>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
>> have tried other program in Hadoop with the same settings so the memory is
>> available in my machines.
>>
>>
>> public static void main(String[] args) {
>>    try{
>>              String myFile = "xxx.dat";
>>              FileInputStream fin = new FileInputStream(myFile);
>>              ois = new ObjectInputStream(fin);
>>              margintagMap = ois.readObject();
>>              ois.close();
>>              fin.close();
>>      }catch(Exception e){
>>          //
>>     }
>> }
>>
>> On 2010-10-13 13:30, Luke Lu wrote:
>>      
>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>
>>>        
>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>> Hadoop
>>>> requires much more memory than an ordinary JVM.
>>>>
>>>>          
>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>> is much smaller than the standard jvm default 512M and most users
>>> don't need to increase it. Please post the code reading the object (in
>>> hdfs?) in your tasks.
>>>
>>>
>>>        
>>>> I found that instead of
>>>> serialization the object, maybe I could create a MapFile as an index to
>>>> permit lookups by key in Hadoop. I have also compared the performance of
>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>> MapFile
>>>> approach.
>>>>
>>>> Shi
>>>>
>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>
>>>>          
>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>      wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>> stored
>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>> setting
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> -Xmx
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>>>> code
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> is
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> map/reduce
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
>>>>>>> the
>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> little
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>> much
>>>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> memory
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> it requires (generally) in Hadoop?
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>              
>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>> you
>>>>> will hit this.
>>>>>
>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>>>> JVM.
>>>>>
>>>>> -Srivas.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Shi
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>>
>>>>
>>>>          
>>
>> --
>> Postdoctoral Scholar
>> Institute for Genomics and Systems Biology
>> Department of Medicine, the University of Chicago
>> Knapp Center for Biomedical Discovery
>> 900 E. 57th St. Room 10148
>> Chicago, IL 60637, US
>> Tel: 773-702-6799
>>
>>
>>      


-- 
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Re: load a serialized object in hadoop

Posted by Luke Lu <ll...@vicaya.com>.

Can you post your mapper/reducer implementation? or are you using
hadoop streaming? for which mapred.child.java.opts doesn't apply to
the jvm you care about. BTW, what's the hadoop version you're using?

On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu <sh...@uchicago.edu> wrote:
> Here is my code. There is no Map/Reduce in it. I could run this code using
> java -Xmx1000m ,  however, when using  bin/hadoop  -D
> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
> have tried other program in Hadoop with the same settings so the memory is
> available in my machines.
>
>
> public static void main(String[] args) {
>   try{
>             String myFile = "xxx.dat";
>             FileInputStream fin = new FileInputStream(myFile);
>             ois = new ObjectInputStream(fin);
>             margintagMap = ois.readObject();
>             ois.close();
>             fin.close();
>     }catch(Exception e){
>         //
>    }
> }
>
> On 2010-10-13 13:30, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>>
>>>
>>> As a coming-up to the my own question, I think to invoke the JVM in
>>> Hadoop
>>> requires much more memory than an ordinary JVM.
>>>
>>
>> That's simply not true. The default mapreduce task Xmx is 200M, which
>> is much smaller than the standard jvm default 512M and most users
>> don't need to increase it. Please post the code reading the object (in
>> hdfs?) in your tasks.
>>
>>
>>>
>>> I found that instead of
>>> serialization the object, maybe I could create a MapFile as an index to
>>> permit lookups by key in Hadoop. I have also compared the performance of
>>> MongoDB and Memcache. I will let you know the result after I try the
>>> MapFile
>>> approach.
>>>
>>> Shi
>>>
>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>>>>
>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>> stored
>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>> setting
>>>>>>
>>>>>>
>>>>>
>>>>> -Xmx
>>>>>
>>>>>
>>>>>>
>>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>>> code
>>>>>>
>>>>>>
>>>>>
>>>>> is
>>>>>
>>>>>
>>>>>>
>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>
>>>>>>
>>>>>
>>>>> map/reduce
>>>>>
>>>>>
>>>>>>
>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
>>>>>> the
>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
>>>>>>
>>>>>>
>>>>>
>>>>> little
>>>>>
>>>>>
>>>>>>
>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>> much
>>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>>
>>>>>>
>>>>>
>>>>> memory
>>>>>
>>>>>
>>>>>>
>>>>>> it requires (generally) in Hadoop?
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> The JVM reserves swap space in advance, at the time of launching the
>>>> process. If your swap is too low (or do not have any swap configured),
>>>> you
>>>> will hit this.
>>>>
>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>>> JVM.
>>>>
>>>> -Srivas.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>
>
> --
> Postdoctoral Scholar
> Institute for Genomics and Systems Biology
> Department of Medicine, the University of Chicago
> Knapp Center for Biomedical Discovery
> 900 E. 57th St. Room 10148
> Chicago, IL 60637, US
> Tel: 773-702-6799
>
>

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

Here is my code. There is no Map/Reduce in it. I could run this code 
using java -Xmx1000m ,  however, when using  bin/hadoop  -D 
mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  
I have tried other program in Hadoop with the same settings so the 
memory is available in my machines.


public static void main(String[] args) {
    try{
              String myFile = "xxx.dat";
              FileInputStream fin = new FileInputStream(myFile);
              ois = new ObjectInputStream(fin);
              margintagMap = ois.readObject();
              ois.close();
              fin.close();
      }catch(Exception e){
          //
     }
}

On 2010-10-13 13:30, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>    
>> As a coming-up to the my own question, I think to invoke the JVM in Hadoop
>> requires much more memory than an ordinary JVM.
>>      
> That's simply not true. The default mapreduce task Xmx is 200M, which
> is much smaller than the standard jvm default 512M and most users
> don't need to increase it. Please post the code reading the object (in
> hdfs?) in your tasks.
>
>    
>> I found that instead of
>> serialization the object, maybe I could create a MapFile as an index to
>> permit lookups by key in Hadoop. I have also compared the performance of
>> MongoDB and Memcache. I will let you know the result after I try the MapFile
>> approach.
>>
>> Shi
>>
>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>      
>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>    wrote:
>>>>
>>>>
>>>>          
>>>>> Hi,
>>>>>
>>>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>>>
>>>>>            
>>>> -Xmx
>>>>
>>>>          
>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>> code
>>>>>
>>>>>            
>>>> is
>>>>
>>>>          
>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>
>>>>>            
>>>> map/reduce
>>>>
>>>>          
>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
>>>>>
>>>>>            
>>>> little
>>>>
>>>>          
>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>
>>>>>            
>>>> memory
>>>>
>>>>          
>>>>> it requires (generally) in Hadoop?
>>>>>
>>>>>            
>>>>
>>>>          
>>> The JVM reserves swap space in advance, at the time of launching the
>>> process. If your swap is too low (or do not have any swap configured), you
>>> will hit this.
>>>
>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>> JVM.
>>>
>>> -Srivas.
>>>
>>>
>>>
>>>
>>>        
>>>>> Thanks.
>>>>>
>>>>> Shi
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>
>>
>>      


-- 
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Re: load a serialized object in hadoop

Posted by Luke Lu <ll...@vicaya.com>.

On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu <sh...@uchicago.edu> wrote:
> As a coming-up to the my own question, I think to invoke the JVM in Hadoop
> requires much more memory than an ordinary JVM.

That's simply not true. The default mapreduce task Xmx is 200M, which
is much smaller than the standard jvm default 512M and most users
don't need to increase it. Please post the code reading the object (in
hdfs?) in your tasks.

> I found that instead of
> serialization the object, maybe I could create a MapFile as an index to
> permit lookups by key in Hadoop. I have also compared the performance of
> MongoDB and Memcache. I will let you know the result after I try the MapFile
> approach.
>
> Shi
>
> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>>
>>>
>>> -Xmx
>>>
>>>>
>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>> code
>>>>
>>>
>>> is
>>>
>>>>
>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>
>>>
>>> map/reduce
>>>
>>>>
>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
>>>>
>>>
>>> little
>>>
>>>>
>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>
>>>
>>> memory
>>>
>>>>
>>>> it requires (generally) in Hadoop?
>>>>
>>>
>>>
>>
>> The JVM reserves swap space in advance, at the time of launching the
>> process. If your swap is too low (or do not have any swap configured), you
>> will hit this.
>>
>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>> JVM.
>>
>> -Srivas.
>>
>>
>>
>>
>>>>
>>>> Thanks.
>>>>
>>>> Shi
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>

Re: load a serialized object in hadoop

Posted by Matt Pouttu-Clarke <Ma...@icrossing.com>.

Also, serialization often keeps previously read object references in  
memory.  Better to use Thrift or Avro to serialize the object.

In my experience serialization is inefficient for large object graphs,  
but works fine for smaller graphs (depending on how much memory you  
have to work with).

Also for that small of data memcache and mongo may be overkill (unless  
the data changes frequently)

Cheers,
Matt

On Oct 13, 2010, at 11:04 AM, "Shi Yu" <sh...@uchicago.edu> wrote:

> As a coming-up to the my own question, I think to invoke the JVM in  
> Hadoop requires much more memory than an ordinary JVM. I found that  
> instead of serialization the object, maybe I could create a MapFile  
> as an index to permit lookups by key in Hadoop. I have also compared  
> the performance of MongoDB and Memcache. I will let you know the  
> result after I try the MapFile approach.
>
> Shi
>
> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I want to load a serialized HashMap object in hadoop. The file of  
>>>> stored
>>>> object is 200M. I could read that object efficiently in JAVA by  
>>>> setting
>>>>
>>> -Xmx
>>>
>>>> as 1000M.  However, in hadoop I could never load it into memory.  
>>>> The code
>>>>
>>> is
>>>
>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>
>>> map/reduce
>>>
>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still  
>>>> get the
>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone  
>>>> explain a
>>>>
>>> little
>>>
>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up  
>>>> so much
>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>
>>> memory
>>>
>>>> it requires (generally) in Hadoop?
>>>>
>>>
>> The JVM reserves swap space in advance, at the time of launching the
>> process. If your swap is too low (or do not have any swap  
>> configured), you
>> will hit this.
>>
>> Or, you are on a 32-bit machine, in which case 3G is not possible  
>> in the
>> JVM.
>>
>> -Srivas.
>>
>>
>>
>>
>>>> Thanks.
>>>>
>>>> Shi
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>
>>
>
>

iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: load a serialized object in hadoop

Posted by Shi Yu <sh...@uchicago.edu>.

As a coming-up to the my own question, I think to invoke the JVM in 
Hadoop requires much more memory than an ordinary JVM. I found that 
instead of serialization the object, maybe I could create a MapFile as 
an index to permit lookups by key in Hadoop. I have also compared the 
performance of MongoDB and Memcache. I will let you know the result 
after I try the MapFile approach.

Shi

On 2010-10-12 21:59, M. C. Srivas wrote:
>>
>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>  wrote:
>>
>>      
>>> Hi,
>>>
>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>        
>> -Xmx
>>      
>>> as 1000M.  However, in hadoop I could never load it into memory. The code
>>>        
>> is
>>      
>>> very simple (just read the ObjectInputStream) and there is yet no
>>>        
>> map/reduce
>>      
>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
>>>        
>> little
>>      
>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>> memory?  If a program requires 1G memory on a single node, how much
>>>        
>> memory
>>      
>>> it requires (generally) in Hadoop?
>>>        
>>      
> The JVM reserves swap space in advance, at the time of launching the
> process. If your swap is too low (or do not have any swap configured), you
> will hit this.
>
> Or, you are on a 32-bit machine, in which case 3G is not possible in the
> JVM.
>
> -Srivas.
>
>
>
>    
>>> Thanks.
>>>
>>> Shi
>>>
>>> --
>>>
>>>
>>>        
>>      
>

Re: load a serialized object in hadoop

Posted by "M. C. Srivas" <mc...@gmail.com>.

>
>
> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <sh...@uchicago.edu> wrote:
>
> > Hi,
> >
> > I want to load a serialized HashMap object in hadoop. The file of stored
> > object is 200M. I could read that object efficiently in JAVA by setting
> -Xmx
> > as 1000M.  However, in hadoop I could never load it into memory. The code
> is
> > very simple (just read the ObjectInputStream) and there is yet no
> map/reduce
> > implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
> > "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
> little
> > bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
> > memory?  If a program requires 1G memory on a single node, how much
> memory
> > it requires (generally) in Hadoop?
>

The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap configured), you
will hit this.

Or, you are on a 32-bit machine, in which case 3G is not possible in the
JVM.

-Srivas.



> >
> > Thanks.
> >
> > Shi
> >
> > --
> >
> >
>

Re: load a serialized object in hadoop

Posted by Charles Lee <li...@gmail.com>.

In 32 bit machine, the biggest memory the jvm can provide is in the range of
1.5g to 2.0g.  So if you want a bigger memory, say 3000M, you should have a
64bit machine.

On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <sh...@uchicago.edu> wrote:

> Hi,
>
> I want to load a serialized HashMap object in hadoop. The file of stored
> object is 200M. I could read that object efficiently in JAVA by setting -Xmx
> as 1000M.  However, in hadoop I could never load it into memory. The code is
> very simple (just read the ObjectInputStream) and there is yet no map/reduce
> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a little
> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
> memory?  If a program requires 1G memory on a single node, how much memory
> it requires (generally) in Hadoop?
>
> Thanks.
>
> Shi
>
> --
>
>


-- 
Yours sincerely,
Charles Lee