You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shi Yu <sh...@uchicago.edu> on 2010/10/11 22:50:03 UTC
load a serialized object in hadoop
Hi,
I want to load a serialized HashMap object in hadoop. The file of stored
object is 200M. I could read that object efficiently in JAVA by setting
-Xmx as 1000M. However, in hadoop I could never load it into memory.
The code is very simple (just read the ObjectInputStream) and there is
yet no map/reduce implemented. I set the
mapred.child.java.opts=-Xmx3000M, still get the
"java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
little bit how memory is allocate to JVM in hadoop. Why hadoop takes up
so much memory? If a program requires 1G memory on a single node, how
much memory it requires (generally) in Hadoop?
Thanks.
Shi
--
Re: load a serialized object in hadoop
Posted by Konstantin Boudnik <co...@boudnik.org>.
You should have no space here "-D HADOOP_CLIENT_OPTS"
On Wed, Oct 13, 2010 at 04:21PM, Shi Yu wrote:
> Hi, thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>
> still no effect. Or this is a system variable? Should I export it?
> How to configure it?
>
> Shi
>
> java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
>
>
> On 2010-10-13 15:28, Luke Lu wrote:
> >On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
> >>I haven't implemented anything in map/reduce yet for this issue. I just try
> >>to invoke the same java class using bin/hadoop command. The thing is a
> >>very simple program could be executed in Java, but not doable in bin/hadoop
> >>command.
> >If you are just trying to use bin/hadoop jar your.jar command, your
> >code runs in a local client jvm and mapred.child.java.opts has no
> >effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
> >jar your.jar
> >
> >>I think if I couldn't get through the first stage, even I had a
> >>map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
> >>
> >>Best Regards,
> >>
> >>Shi
> >>
> >>On 2010-10-13 14:15, Luke Lu wrote:
> >>>Can you post your mapper/reducer implementation? or are you using
> >>>hadoop streaming? for which mapred.child.java.opts doesn't apply to
> >>>the jvm you care about. BTW, what's the hadoop version you're using?
> >>>
> >>>On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
> >>>
> >>>>Here is my code. There is no Map/Reduce in it. I could run this code
> >>>>using
> >>>>java -Xmx1000m , however, when using bin/hadoop -D
> >>>>mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I
> >>>>have tried other program in Hadoop with the same settings so the memory
> >>>>is
> >>>>available in my machines.
> >>>>
> >>>>
> >>>>public static void main(String[] args) {
> >>>> try{
> >>>> String myFile = "xxx.dat";
> >>>> FileInputStream fin = new FileInputStream(myFile);
> >>>> ois = new ObjectInputStream(fin);
> >>>> margintagMap = ois.readObject();
> >>>> ois.close();
> >>>> fin.close();
> >>>> }catch(Exception e){
> >>>> //
> >>>> }
> >>>>}
> >>>>
> >>>>On 2010-10-13 13:30, Luke Lu wrote:
> >>>>
> >>>>>On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
> >>>>>
> >>>>>
> >>>>>>As a coming-up to the my own question, I think to invoke the JVM in
> >>>>>>Hadoop
> >>>>>>requires much more memory than an ordinary JVM.
> >>>>>>
> >>>>>>
> >>>>>That's simply not true. The default mapreduce task Xmx is 200M, which
> >>>>>is much smaller than the standard jvm default 512M and most users
> >>>>>don't need to increase it. Please post the code reading the object (in
> >>>>>hdfs?) in your tasks.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>I found that instead of
> >>>>>>serialization the object, maybe I could create a MapFile as an index to
> >>>>>>permit lookups by key in Hadoop. I have also compared the performance
> >>>>>>of
> >>>>>>MongoDB and Memcache. I will let you know the result after I try the
> >>>>>>MapFile
> >>>>>>approach.
> >>>>>>
> >>>>>>Shi
> >>>>>>
> >>>>>>On 2010-10-12 21:59, M. C. Srivas wrote:
> >>>>>>
> >>>>>>
> >>>>>>>>On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>Hi,
> >>>>>>>>>
> >>>>>>>>>I want to load a serialized HashMap object in hadoop. The file of
> >>>>>>>>>stored
> >>>>>>>>>object is 200M. I could read that object efficiently in JAVA by
> >>>>>>>>>setting
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>-Xmx
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>as 1000M. However, in hadoop I could never load it into memory. The
> >>>>>>>>>code
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>is
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>very simple (just read the ObjectInputStream) and there is yet no
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>map/reduce
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>implemented. I set the mapred.child.java.opts=-Xmx3000M, still get
> >>>>>>>>>the
> >>>>>>>>>"java.lang.OutOfMemoryError: Java heap space" Could anyone explain
> >>>>>>>>>a
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>little
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
> >>>>>>>>>much
> >>>>>>>>>memory? If a program requires 1G memory on a single node, how much
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>memory
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>it requires (generally) in Hadoop?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>The JVM reserves swap space in advance, at the time of launching the
> >>>>>>>process. If your swap is too low (or do not have any swap configured),
> >>>>>>>you
> >>>>>>>will hit this.
> >>>>>>>
> >>>>>>>Or, you are on a 32-bit machine, in which case 3G is not possible in
> >>>>>>>the
> >>>>>>>JVM.
> >>>>>>>
> >>>>>>>-Srivas.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>Thanks.
> >>>>>>>>>
> >>>>>>>>>Shi
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>--
> >>>>Postdoctoral Scholar
> >>>>Institute for Genomics and Systems Biology
> >>>>Department of Medicine, the University of Chicago
> >>>>Knapp Center for Biomedical Discovery
> >>>>900 E. 57th St. Room 10148
> >>>>Chicago, IL 60637, US
> >>>>Tel: 773-702-6799
> >>>>
> >>>>
> >>>>
> >>
> >>--
> >>Postdoctoral Scholar
> >>Institute for Genomics and Systems Biology
> >>Department of Medicine, the University of Chicago
> >>Knapp Center for Biomedical Discovery
> >>900 E. 57th St. Room 10148
> >>Chicago, IL 60637, US
> >>Tel: 773-702-6799
> >>
> >>
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Just a remind and a warning, changing the
HADOOP_CLIENT_OPTS
HADOOP_OPTS
items in the hadoop-env.sh improperly may cause Hadoop to crash.
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Thanks. Well I set the value to 3000M in hadoop-env.sh so it has the
same configuration as Java
export HADOOP_HEAPSIZE=3000
export HADOOP_CLIENT_OPTS=3000
Then I did the comparison:
sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar
OOloadtest
timing (hms): 0 hour(s) 2 minute(s) 53 second(s) 599millisecond(s)
sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ java -Xms3G -Xmx3G -classpath
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
timing (hms): 0 hour(s) 1 minute(s) 14 second(s) 7millisecond(s)
It seems the hadoop command is 50% slower. I guess that was because I
didn't set the "initial heap size" correctly (correspond to the -Xms
configuration in Java). I tried like this
sheeyu@ocuic3:~/hadoop/hadoop-0.19.2$ bin/hadoop jar WordCount.jar
OOloadtest -D mapred.child.java.opts=-Xmx3000M
timing (hms): 0 hour(s) 3 minute(s) 0 second(s) 343millisecond(s)
I also tried
HADOOP_OPTS=-Xmx3000M bin/hadoop jar WordCount.jar OOloadtest
timing (hms): 0 hour(s) 3 minute(s) 7 second(s) 774millisecond(s)
Now it works!
So how to set the initial heap size? HADOOP_NAMENODE_OPTS or
HADOOP_CLIENT_OPTS ? Because there is 50% difference in speed.
Shi
On 2010-10-13 19:04, Luke Lu wrote:
> Just took a look at the bin/hadoop of your particular version
> (http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup).
> It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar
> command, which is fixed in later version.
>
> So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work
> because it just translates to the same java command line that worked
> for you :)
>
> __Luke
>
> On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu<sh...@uchicago.edu> wrote:
>
>> Hi, I tried the following five ways:
>>
>> Approach 1: in command line
>> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>>
>>
>> Approach 2: I added the hadoop-site.xml file with the following element.
>> Each time I changed, I stop and restart hadoop on all the nodes.
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>-Xmx4000m</value>
>> </property>
>>
>> run the command
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> Approach 3: I changed like this
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>4000m</value>
>> </property>
>> ....
>>
>> Then run the command:
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> Approach 4: To make sure, I changed the "m" to numbers, that was
>> ...
>> <property>
>> <name>HADOOP_CLIENT_OPTS</name>
>> <value>4000000000</value>
>> </property>
>> ....
>>
>> Then run the command:
>> $bin/hadoop jar WordCount.jar OOloadtest
>>
>> All these four approaches come to the same "Java heap space" error.
>>
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
>> at java.lang.StringBuilder.<init>(StringBuilder.java:68)
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
>> at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>> at java.util.HashMap.readObject(HashMap.java:1028)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
>> at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>> at ObjectManager.loadObject(ObjectManager.java:42)
>> at OOloadtest.main(OOloadtest.java:21)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>>
>> Approach 5:
>> In comparison, I called the Java command directly as follows (there is a
>> counter showing how much time it costs if the serialized object is
>> successfully loaded):
>>
>> $java -Xms3G -Xmx3G -classpath
>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>>
>> return:
>> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
>> 162millisecond(s)
>>
>>
>> What was the problem in my command? Where can I find the documentation about
>> HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
>>
>> Shi
>>
>>
>> On 2010-10-13 16:28, Luke Lu wrote:
>>
>>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>> Hi, thanks for the advice. I tried with your settings,
>>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>>> still no effect. Or this is a system variable? Should I export it? How to
>>>> configure it?
>>>>
>>>>
>>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>>
>>> if you use sh derivative shells (bash, ksh etc.) prepend env for other
>>> shells.
>>>
>>> __Luke
>>>
>>>
>>>
>>>
>>>> Shi
>>>>
>>>> java -Xms3G -Xmx3G -classpath
>>>>
>>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>>>> OOloadtest
>>>>
>>>>
>>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>>
>>>>
>>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>>>> try
>>>>>> to invoke the same java class using bin/hadoop command. The thing
>>>>>> is
>>>>>> a
>>>>>> very simple program could be executed in Java, but not doable in
>>>>>> bin/hadoop
>>>>>> command.
>>>>>>
>>>>>>
>>>>>>
>>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>>> jar your.jar
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I think if I couldn't get through the first stage, even I had a
>>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2.
>>>>>> Thanks.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>>
>>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>>>> using
>>>>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough
>>>>>>>> error.
>>>>>>>> I
>>>>>>>> have tried other program in Hadoop with the same settings so the
>>>>>>>> memory
>>>>>>>> is
>>>>>>>> available in my machines.
>>>>>>>>
>>>>>>>>
>>>>>>>> public static void main(String[] args) {
>>>>>>>> try{
>>>>>>>> String myFile = "xxx.dat";
>>>>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>>>>> ois = new ObjectInputStream(fin);
>>>>>>>> margintagMap = ois.readObject();
>>>>>>>> ois.close();
>>>>>>>> fin.close();
>>>>>>>> }catch(Exception e){
>>>>>>>> //
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>>>> Hadoop
>>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M,
>>>>>>>>> which
>>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>>> don't need to increase it. Please post the code reading the object
>>>>>>>>> (in
>>>>>>>>> hdfs?) in your tasks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I found that instead of
>>>>>>>>>> serialization the object, maybe I could create a MapFile as an
>>>>>>>>>> index
>>>>>>>>>> to
>>>>>>>>>> permit lookups by key in Hadoop. I have also compared the
>>>>>>>>>> performance
>>>>>>>>>> of
>>>>>>>>>> MongoDB and Memcache. I will let you know the result after I try
>>>>>>>>>> the
>>>>>>>>>> MapFile
>>>>>>>>>> approach.
>>>>>>>>>>
>>>>>>>>>> Shi
>>>>>>>>>>
>>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file
>>>>>>>>>>>>> of
>>>>>>>>>>>>> stored
>>>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>>>> setting
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> code
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet
>>>>>>>>>>>>> no
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> map/reduce
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>>>> get
>>>>>>>>>>>>> the
>>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>>>>> explain
>>>>>>>>>>>>> a
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> little
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>>>> so
>>>>>>>>>>>>> much
>>>>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>>>>> much
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> memory
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> The JVM reserves swap space in advance, at the time of launching
>>>>>>>>>>> the
>>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>>> configured),
>>>>>>>>>>> you
>>>>>>>>>>> will hit this.
>>>>>>>>>>>
>>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> JVM.
>>>>>>>>>>>
>>>>>>>>>>> -Srivas.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shi
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Postdoctoral Scholar
>>>>>>>> Institute for Genomics and Systems Biology
>>>>>>>> Department of Medicine, the University of Chicago
>>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>>> 900 E. 57th St. Room 10148
>>>>>>>> Chicago, IL 60637, US
>>>>>>>> Tel: 773-702-6799
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Postdoctoral Scholar
>>>>>> Institute for Genomics and Systems Biology
>>>>>> Department of Medicine, the University of Chicago
>>>>>> Knapp Center for Biomedical Discovery
>>>>>> 900 E. 57th St. Room 10148
>>>>>> Chicago, IL 60637, US
>>>>>> Tel: 773-702-6799
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>
>>
>>
Re: load a serialized object in hadoop
Posted by Luke Lu <ll...@vicaya.com>.
Just took a look at the bin/hadoop of your particular version
(http://svn.apache.org/viewvc/hadoop/common/tags/release-0.19.2/bin/hadoop?revision=796970&view=markup).
It looks like that HADOOP_CLIENT_OPTS doesn't work with the jar
command, which is fixed in later version.
So try HADOOP_OPTS=-Xmx1000M bin/hadoop ... instead. It would work
because it just translates to the same java command line that worked
for you :)
__Luke
On Wed, Oct 13, 2010 at 4:18 PM, Shi Yu <sh...@uchicago.edu> wrote:
> Hi, I tried the following five ways:
>
> Approach 1: in command line
> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>
>
> Approach 2: I added the hadoop-site.xml file with the following element.
> Each time I changed, I stop and restart hadoop on all the nodes.
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>-Xmx4000m</value>
> </property>
>
> run the command
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 3: I changed like this
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000m</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 4: To make sure, I changed the "m" to numbers, that was
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000000000</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> All these four approaches come to the same "Java heap space" error.
>
> java.lang.OutOfMemoryError: Java heap space
> at
> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
> at java.lang.StringBuilder.<init>(StringBuilder.java:68)
> at
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
> at
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
> at java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at java.util.HashMap.readObject(HashMap.java:1028)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at ObjectManager.loadObject(ObjectManager.java:42)
> at OOloadtest.main(OOloadtest.java:21)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
>
> Approach 5:
> In comparison, I called the Java command directly as follows (there is a
> counter showing how much time it costs if the serialized object is
> successfully loaded):
>
> $java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>
> return:
> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
> 162millisecond(s)
>
>
> What was the problem in my command? Where can I find the documentation about
> HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
>
> Shi
>
>
> On 2010-10-13 16:28, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>
>>>
>>> Hi, thanks for the advice. I tried with your settings,
>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>> still no effect. Or this is a system variable? Should I export it? How to
>>> configure it?
>>>
>>
>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>
>> if you use sh derivative shells (bash, ksh etc.) prepend env for other
>> shells.
>>
>> __Luke
>>
>>
>>
>>>
>>> Shi
>>>
>>> java -Xms3G -Xmx3G -classpath
>>>
>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>>> OOloadtest
>>>
>>>
>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>
>>>>
>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>
>>>>
>>>>>
>>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>>> try
>>>>> to invoke the same java class using bin/hadoop command. The thing
>>>>> is
>>>>> a
>>>>> very simple program could be executed in Java, but not doable in
>>>>> bin/hadoop
>>>>> command.
>>>>>
>>>>>
>>>>
>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>> jar your.jar
>>>>
>>>>
>>>>
>>>>>
>>>>> I think if I couldn't get through the first stage, even I had a
>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2.
>>>>> Thanks.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>>> using
>>>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough
>>>>>>> error.
>>>>>>> I
>>>>>>> have tried other program in Hadoop with the same settings so the
>>>>>>> memory
>>>>>>> is
>>>>>>> available in my machines.
>>>>>>>
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>> try{
>>>>>>> String myFile = "xxx.dat";
>>>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>>>> ois = new ObjectInputStream(fin);
>>>>>>> margintagMap = ois.readObject();
>>>>>>> ois.close();
>>>>>>> fin.close();
>>>>>>> }catch(Exception e){
>>>>>>> //
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>>> Hadoop
>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M,
>>>>>>>> which
>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>> don't need to increase it. Please post the code reading the object
>>>>>>>> (in
>>>>>>>> hdfs?) in your tasks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I found that instead of
>>>>>>>>> serialization the object, maybe I could create a MapFile as an
>>>>>>>>> index
>>>>>>>>> to
>>>>>>>>> permit lookups by key in Hadoop. I have also compared the
>>>>>>>>> performance
>>>>>>>>> of
>>>>>>>>> MongoDB and Memcache. I will let you know the result after I try
>>>>>>>>> the
>>>>>>>>> MapFile
>>>>>>>>> approach.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file
>>>>>>>>>>>> of
>>>>>>>>>>>> stored
>>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>>> setting
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Xmx
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>>>>>>>>>> The
>>>>>>>>>>>> code
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet
>>>>>>>>>>>> no
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> map/reduce
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>>>> explain
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> little
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>>> so
>>>>>>>>>>>> much
>>>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>>>> much
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> memory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The JVM reserves swap space in advance, at the time of launching
>>>>>>>>>> the
>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>> configured),
>>>>>>>>>> you
>>>>>>>>>> will hit this.
>>>>>>>>>>
>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible
>>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>> JVM.
>>>>>>>>>>
>>>>>>>>>> -Srivas.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Shi
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Postdoctoral Scholar
>>>>>>> Institute for Genomics and Systems Biology
>>>>>>> Department of Medicine, the University of Chicago
>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>> 900 E. 57th St. Room 10148
>>>>>>> Chicago, IL 60637, US
>>>>>>> Tel: 773-702-6799
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Hi, I got it, it should be declared in the
enhadoop-env.sh
export HADOOP_CLIENT_OPTS=-Xmx4000m
Thanks! At the same time I see corrections come in.
Shi
On 2010-10-13 18:18, Shi Yu wrote:
> Hi, I tried the following five ways:
>
> Approach 1: in command line
> HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
>
>
> Approach 2: I added the hadoop-site.xml file with the following
> element. Each time I changed, I stop and restart hadoop on all the nodes.
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>-Xmx4000m</value>
> </property>
>
> run the command
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 3: I changed like this
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000m</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> Approach 4: To make sure, I changed the "m" to numbers, that was
> ...
> <property>
> <name>HADOOP_CLIENT_OPTS</name>
> <value>4000000000</value>
> </property>
> ....
>
> Then run the command:
> $bin/hadoop jar WordCount.jar OOloadtest
>
> All these four approaches come to the same "Java heap space" error.
>
> java.lang.OutOfMemoryError: Java heap space
> at
> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
> at java.lang.StringBuilder.<init>(StringBuilder.java:68)
> at
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
>
> at
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
>
> at
> java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
> at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at java.util.HashMap.readObject(HashMap.java:1028)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
> at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at ObjectManager.loadObject(ObjectManager.java:42)
> at OOloadtest.main(OOloadtest.java:21)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
>
> Approach 5:
> In comparison, I called the Java command directly as follows (there is
> a counter showing how much time it costs if the serialized object is
> successfully loaded):
>
> $java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
>
> return:
> object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
> 162millisecond(s)
>
>
> What was the problem in my command? Where can I find the documentation
> about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it
> works?
>
> Shi
>
>
> On 2010-10-13 16:28, Luke Lu wrote:
>> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>> Hi, thanks for the advice. I tried with your settings,
>>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>>> still no effect. Or this is a system variable? Should I export it?
>>> How to
>>> configure it?
>> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
>> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>>
>> if you use sh derivative shells (bash, ksh etc.) prepend env for
>> other shells.
>>
>> __Luke
>>
>>
>>> Shi
>>>
>>> java -Xms3G -Xmx3G -classpath
>>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>>>
>>> OOloadtest
>>>
>>>
>>> On 2010-10-13 15:28, Luke Lu wrote:
>>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>
>>>>> I haven't implemented anything in map/reduce yet for this issue. I
>>>>> just
>>>>> try
>>>>> to invoke the same java class using bin/hadoop command. The
>>>>> thing is
>>>>> a
>>>>> very simple program could be executed in Java, but not doable in
>>>>> bin/hadoop
>>>>> command.
>>>>>
>>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>>> code runs in a local client jvm and mapred.child.java.opts has no
>>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>>> jar your.jar
>>>>
>>>>
>>>>> I think if I couldn't get through the first stage, even I had a
>>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2.
>>>>> Thanks.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>>
>>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Here is my code. There is no Map/Reduce in it. I could run this
>>>>>>> code
>>>>>>> using
>>>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough
>>>>>>> error.
>>>>>>> I
>>>>>>> have tried other program in Hadoop with the same settings so the
>>>>>>> memory
>>>>>>> is
>>>>>>> available in my machines.
>>>>>>>
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>> try{
>>>>>>> String myFile = "xxx.dat";
>>>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>>>> ois = new ObjectInputStream(fin);
>>>>>>> margintagMap = ois.readObject();
>>>>>>> ois.close();
>>>>>>> fin.close();
>>>>>>> }catch(Exception e){
>>>>>>> //
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> As a coming-up to the my own question, I think to invoke the
>>>>>>>>> JVM in
>>>>>>>>> Hadoop
>>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M,
>>>>>>>> which
>>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>>> don't need to increase it. Please post the code reading the
>>>>>>>> object (in
>>>>>>>> hdfs?) in your tasks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I found that instead of
>>>>>>>>> serialization the object, maybe I could create a MapFile as an
>>>>>>>>> index
>>>>>>>>> to
>>>>>>>>> permit lookups by key in Hadoop. I have also compared the
>>>>>>>>> performance
>>>>>>>>> of
>>>>>>>>> MongoDB and Memcache. I will let you know the result after I
>>>>>>>>> try the
>>>>>>>>> MapFile
>>>>>>>>> approach.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The
>>>>>>>>>>>> file of
>>>>>>>>>>>> stored
>>>>>>>>>>>> object is 200M. I could read that object efficiently in
>>>>>>>>>>>> JAVA by
>>>>>>>>>>>> setting
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> -Xmx
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> as 1000M. However, in hadoop I could never load it into
>>>>>>>>>>>> memory.
>>>>>>>>>>>> The
>>>>>>>>>>>> code
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is
>>>>>>>>>>>> yet no
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> map/reduce
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M,
>>>>>>>>>>>> still
>>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>>>> explain
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> little
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop
>>>>>>>>>>>> takes up
>>>>>>>>>>>> so
>>>>>>>>>>>> much
>>>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>>>> much
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> memory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> The JVM reserves swap space in advance, at the time of
>>>>>>>>>> launching the
>>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>>> configured),
>>>>>>>>>> you
>>>>>>>>>> will hit this.
>>>>>>>>>>
>>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not
>>>>>>>>>> possible in
>>>>>>>>>> the
>>>>>>>>>> JVM.
>>>>>>>>>>
>>>>>>>>>> -Srivas.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Shi
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>>> Postdoctoral Scholar
>>>>>>> Institute for Genomics and Systems Biology
>>>>>>> Department of Medicine, the University of Chicago
>>>>>>> Knapp Center for Biomedical Discovery
>>>>>>> 900 E. 57th St. Room 10148
>>>>>>> Chicago, IL 60637, US
>>>>>>> Tel: 773-702-6799
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Hi, I tried the following five ways:
Approach 1: in command line
HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
Approach 2: I added the hadoop-site.xml file with the following element.
Each time I changed, I stop and restart hadoop on all the nodes.
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>-Xmx4000m</value>
</property>
run the command
$bin/hadoop jar WordCount.jar OOloadtest
Approach 3: I changed like this
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000m</value>
</property>
....
Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest
Approach 4: To make sure, I changed the "m" to numbers, that was
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000000000</value>
</property>
....
Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest
All these four approaches come to the same "Java heap space" error.
java.lang.OutOfMemoryError: Java heap space
at
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:68)
at
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
at
java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
at
java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at java.util.HashMap.readObject(HashMap.java:1028)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at ObjectManager.loadObject(ObjectManager.java:42)
at OOloadtest.main(OOloadtest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Approach 5:
In comparison, I called the Java command directly as follows (there is a
counter showing how much time it costs if the serialized object is
successfully loaded):
$java -Xms3G -Xmx3G -classpath
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
return:
object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
162millisecond(s)
What was the problem in my command? Where can I find the documentation
about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
Shi
On 2010-10-13 16:28, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu> wrote:
>
>> Hi, thanks for the advice. I tried with your settings,
>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>> still no effect. Or this is a system variable? Should I export it? How to
>> configure it?
>>
> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>
> if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.
>
> __Luke
>
>
>
>> Shi
>>
>> java -Xms3G -Xmx3G -classpath
>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>> OOloadtest
>>
>>
>> On 2010-10-13 15:28, Luke Lu wrote:
>>
>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>> try
>>>> to invoke the same java class using bin/hadoop command. The thing is
>>>> a
>>>> very simple program could be executed in Java, but not doable in
>>>> bin/hadoop
>>>> command.
>>>>
>>>>
>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>> code runs in a local client jvm and mapred.child.java.opts has no
>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>> jar your.jar
>>>
>>>
>>>
>>>> I think if I couldn't get through the first stage, even I had a
>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>>
>>>> Best Regards,
>>>>
>>>> Shi
>>>>
>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>
>>>>
>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>
>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>> using
>>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error.
>>>>>> I
>>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>>> is
>>>>>> available in my machines.
>>>>>>
>>>>>>
>>>>>> public static void main(String[] args) {
>>>>>> try{
>>>>>> String myFile = "xxx.dat";
>>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>>> ois = new ObjectInputStream(fin);
>>>>>> margintagMap = ois.readObject();
>>>>>> ois.close();
>>>>>> fin.close();
>>>>>> }catch(Exception e){
>>>>>> //
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>> Hadoop
>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>>> hdfs?) in your tasks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I found that instead of
>>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>>> to
>>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>>> of
>>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>>> MapFile
>>>>>>>> approach.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>>> stored
>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>> setting
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> -Xmx
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>>>>>>>>> The
>>>>>>>>>>> code
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> map/reduce
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>> get
>>>>>>>>>>> the
>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>>> explain
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> little
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>> so
>>>>>>>>>>> much
>>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>>> much
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> memory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>> configured),
>>>>>>>>> you
>>>>>>>>> will hit this.
>>>>>>>>>
>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>>> the
>>>>>>>>> JVM.
>>>>>>>>>
>>>>>>>>> -Srivas.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Shi
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Postdoctoral Scholar
>>>>>> Institute for Genomics and Systems Biology
>>>>>> Department of Medicine, the University of Chicago
>>>>>> Knapp Center for Biomedical Discovery
>>>>>> 900 E. 57th St. Room 10148
>>>>>> Chicago, IL 60637, US
>>>>>> Tel: 773-702-6799
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Postdoctoral Scholar
>>>> Institute for Genomics and Systems Biology
>>>> Department of Medicine, the University of Chicago
>>>> Knapp Center for Biomedical Discovery
>>>> 900 E. 57th St. Room 10148
>>>> Chicago, IL 60637, US
>>>> Tel: 773-702-6799
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
Re: load a serialized object in hadoop
Posted by Luke Lu <ll...@vicaya.com>.
On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu <sh...@uchicago.edu> wrote:
> Hi, thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
> still no effect. Or this is a system variable? Should I export it? How to
> configure it?
HADOOP_CLIENT_OPTS is an environment variable so you should run it as
HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.
__Luke
> Shi
>
> java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
>
>
> On 2010-10-13 15:28, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>>
>>>
>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>> try
>>> to invoke the same java class using bin/hadoop command. The thing is
>>> a
>>> very simple program could be executed in Java, but not doable in
>>> bin/hadoop
>>> command.
>>>
>>
>> If you are just trying to use bin/hadoop jar your.jar command, your
>> code runs in a local client jvm and mapred.child.java.opts has no
>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>> jar your.jar
>>
>>
>>>
>>> I think if I couldn't get through the first stage, even I had a
>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>
>>> Best Regards,
>>>
>>> Shi
>>>
>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>
>>>>
>>>> Can you post your mapper/reducer implementation? or are you using
>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>
>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>
>>>>
>>>>>
>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>> using
>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error.
>>>>> I
>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>> is
>>>>> available in my machines.
>>>>>
>>>>>
>>>>> public static void main(String[] args) {
>>>>> try{
>>>>> String myFile = "xxx.dat";
>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>> ois = new ObjectInputStream(fin);
>>>>> margintagMap = ois.readObject();
>>>>> ois.close();
>>>>> fin.close();
>>>>> }catch(Exception e){
>>>>> //
>>>>> }
>>>>> }
>>>>>
>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>> Hadoop
>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>> hdfs?) in your tasks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I found that instead of
>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>> to
>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>> of
>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>> MapFile
>>>>>>> approach.
>>>>>>>
>>>>>>> Shi
>>>>>>>
>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>> stored
>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>> setting
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Xmx
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>>>>>>>> The
>>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> map/reduce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>> get
>>>>>>>>>> the
>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>> explain
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> little
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>> so
>>>>>>>>>> much
>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>> much
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>> configured),
>>>>>>>> you
>>>>>>>> will hit this.
>>>>>>>>
>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>> the
>>>>>>>> JVM.
>>>>>>>>
>>>>>>>> -Srivas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Shi
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Hi, thanks for the advice. I tried with your settings,
$ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
still no effect. Or this is a system variable? Should I export it? How
to configure it?
Shi
java -Xms3G -Xmx3G -classpath
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
OOloadtest
On 2010-10-13 15:28, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu> wrote:
>
>> I haven't implemented anything in map/reduce yet for this issue. I just try
>> to invoke the same java class using bin/hadoop command. The thing is a
>> very simple program could be executed in Java, but not doable in bin/hadoop
>> command.
>>
> If you are just trying to use bin/hadoop jar your.jar command, your
> code runs in a local client jvm and mapred.child.java.opts has no
> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
> jar your.jar
>
>
>> I think if I couldn't get through the first stage, even I had a
>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>
>> Best Regards,
>>
>> Shi
>>
>> On 2010-10-13 14:15, Luke Lu wrote:
>>
>>> Can you post your mapper/reducer implementation? or are you using
>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>
>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>> using
>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I
>>>> have tried other program in Hadoop with the same settings so the memory
>>>> is
>>>> available in my machines.
>>>>
>>>>
>>>> public static void main(String[] args) {
>>>> try{
>>>> String myFile = "xxx.dat";
>>>> FileInputStream fin = new FileInputStream(myFile);
>>>> ois = new ObjectInputStream(fin);
>>>> margintagMap = ois.readObject();
>>>> ois.close();
>>>> fin.close();
>>>> }catch(Exception e){
>>>> //
>>>> }
>>>> }
>>>>
>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>
>>>>
>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>> Hadoop
>>>>>> requires much more memory than an ordinary JVM.
>>>>>>
>>>>>>
>>>>>>
>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>> is much smaller than the standard jvm default 512M and most users
>>>>> don't need to increase it. Please post the code reading the object (in
>>>>> hdfs?) in your tasks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I found that instead of
>>>>>> serialization the object, maybe I could create a MapFile as an index to
>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>> of
>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>> MapFile
>>>>>> approach.
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>> stored
>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>> setting
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> -Xmx
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>>>>>>> code
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> is
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> map/reduce
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get
>>>>>>>>> the
>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain
>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> little
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>>>> much
>>>>>>>>> memory? If a program requires 1G memory on a single node, how much
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> memory
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>>>> you
>>>>>>> will hit this.
>>>>>>>
>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>> the
>>>>>>> JVM.
>>>>>>>
>>>>>>> -Srivas.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Shi
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Postdoctoral Scholar
>>>> Institute for Genomics and Systems Biology
>>>> Department of Medicine, the University of Chicago
>>>> Knapp Center for Biomedical Discovery
>>>> 900 E. 57th St. Room 10148
>>>> Chicago, IL 60637, US
>>>> Tel: 773-702-6799
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Postdoctoral Scholar
>> Institute for Genomics and Systems Biology
>> Department of Medicine, the University of Chicago
>> Knapp Center for Biomedical Discovery
>> 900 E. 57th St. Room 10148
>> Chicago, IL 60637, US
>> Tel: 773-702-6799
>>
>>
>>
Re: load a serialized object in hadoop
Posted by Luke Lu <ll...@vicaya.com>.
On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu <sh...@uchicago.edu> wrote:
> I haven't implemented anything in map/reduce yet for this issue. I just try
> to invoke the same java class using bin/hadoop command. The thing is a
> very simple program could be executed in Java, but not doable in bin/hadoop
> command.
If you are just trying to use bin/hadoop jar your.jar command, your
code runs in a local client jvm and mapred.child.java.opts has no
effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
jar your.jar
> I think if I couldn't get through the first stage, even I had a
> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>
> Best Regards,
>
> Shi
>
> On 2010-10-13 14:15, Luke Lu wrote:
>>
>> Can you post your mapper/reducer implementation? or are you using
>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>> the jvm you care about. BTW, what's the hadoop version you're using?
>>
>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>
>>>
>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>> using
>>> java -Xmx1000m , however, when using bin/hadoop -D
>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I
>>> have tried other program in Hadoop with the same settings so the memory
>>> is
>>> available in my machines.
>>>
>>>
>>> public static void main(String[] args) {
>>> try{
>>> String myFile = "xxx.dat";
>>> FileInputStream fin = new FileInputStream(myFile);
>>> ois = new ObjectInputStream(fin);
>>> margintagMap = ois.readObject();
>>> ois.close();
>>> fin.close();
>>> }catch(Exception e){
>>> //
>>> }
>>> }
>>>
>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>
>>>>
>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>
>>>>
>>>>>
>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>> Hadoop
>>>>> requires much more memory than an ordinary JVM.
>>>>>
>>>>>
>>>>
>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>> is much smaller than the standard jvm default 512M and most users
>>>> don't need to increase it. Please post the code reading the object (in
>>>> hdfs?) in your tasks.
>>>>
>>>>
>>>>
>>>>>
>>>>> I found that instead of
>>>>> serialization the object, maybe I could create a MapFile as an index to
>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>> of
>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>> MapFile
>>>>> approach.
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>
>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>> stored
>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>> setting
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -Xmx
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>>>>>> code
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> map/reduce
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get
>>>>>>>> the
>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain
>>>>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> little
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>>> much
>>>>>>>> memory? If a program requires 1G memory on a single node, how much
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> memory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>>> you
>>>>>> will hit this.
>>>>>>
>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>> the
>>>>>> JVM.
>>>>>>
>>>>>> -Srivas.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
> --
> Postdoctoral Scholar
> Institute for Genomics and Systems Biology
> Department of Medicine, the University of Chicago
> Knapp Center for Biomedical Discovery
> 900 E. 57th St. Room 10148
> Chicago, IL 60637, US
> Tel: 773-702-6799
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
I haven't implemented anything in map/reduce yet for this issue. I just
try to invoke the same java class using bin/hadoop command. The
thing is a very simple program could be executed in Java, but not doable
in bin/hadoop command. I think if I couldn't get through the first
stage, even I had a map/reduce program it would also fail. I am using
Hadoop 0.19.2. Thanks.
Best Regards,
Shi
On 2010-10-13 14:15, Luke Lu wrote:
> Can you post your mapper/reducer implementation? or are you using
> hadoop streaming? for which mapred.child.java.opts doesn't apply to
> the jvm you care about. BTW, what's the hadoop version you're using?
>
> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu> wrote:
>
>> Here is my code. There is no Map/Reduce in it. I could run this code using
>> java -Xmx1000m , however, when using bin/hadoop -D
>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I
>> have tried other program in Hadoop with the same settings so the memory is
>> available in my machines.
>>
>>
>> public static void main(String[] args) {
>> try{
>> String myFile = "xxx.dat";
>> FileInputStream fin = new FileInputStream(myFile);
>> ois = new ObjectInputStream(fin);
>> margintagMap = ois.readObject();
>> ois.close();
>> fin.close();
>> }catch(Exception e){
>> //
>> }
>> }
>>
>> On 2010-10-13 13:30, Luke Lu wrote:
>>
>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>> Hadoop
>>>> requires much more memory than an ordinary JVM.
>>>>
>>>>
>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>> is much smaller than the standard jvm default 512M and most users
>>> don't need to increase it. Please post the code reading the object (in
>>> hdfs?) in your tasks.
>>>
>>>
>>>
>>>> I found that instead of
>>>> serialization the object, maybe I could create a MapFile as an index to
>>>> permit lookups by key in Hadoop. I have also compared the performance of
>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>> MapFile
>>>> approach.
>>>>
>>>> Shi
>>>>
>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>
>>>>
>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>> stored
>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>> setting
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -Xmx
>>>>>>
>>>>>>
>>>>>>
>>>>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>>>>> code
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> is
>>>>>>
>>>>>>
>>>>>>
>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> map/reduce
>>>>>>
>>>>>>
>>>>>>
>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get
>>>>>>> the
>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> little
>>>>>>
>>>>>>
>>>>>>
>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>> much
>>>>>>> memory? If a program requires 1G memory on a single node, how much
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> memory
>>>>>>
>>>>>>
>>>>>>
>>>>>>> it requires (generally) in Hadoop?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>> you
>>>>> will hit this.
>>>>>
>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>>>> JVM.
>>>>>
>>>>> -Srivas.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Shi
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Postdoctoral Scholar
>> Institute for Genomics and Systems Biology
>> Department of Medicine, the University of Chicago
>> Knapp Center for Biomedical Discovery
>> 900 E. 57th St. Room 10148
>> Chicago, IL 60637, US
>> Tel: 773-702-6799
>>
>>
>>
--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799
Re: load a serialized object in hadoop
Posted by Luke Lu <ll...@vicaya.com>.
Can you post your mapper/reducer implementation? or are you using
hadoop streaming? for which mapred.child.java.opts doesn't apply to
the jvm you care about. BTW, what's the hadoop version you're using?
On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu <sh...@uchicago.edu> wrote:
> Here is my code. There is no Map/Reduce in it. I could run this code using
> java -Xmx1000m , however, when using bin/hadoop -D
> mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I
> have tried other program in Hadoop with the same settings so the memory is
> available in my machines.
>
>
> public static void main(String[] args) {
> try{
> String myFile = "xxx.dat";
> FileInputStream fin = new FileInputStream(myFile);
> ois = new ObjectInputStream(fin);
> margintagMap = ois.readObject();
> ois.close();
> fin.close();
> }catch(Exception e){
> //
> }
> }
>
> On 2010-10-13 13:30, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>
>>>
>>> As a coming-up to the my own question, I think to invoke the JVM in
>>> Hadoop
>>> requires much more memory than an ordinary JVM.
>>>
>>
>> That's simply not true. The default mapreduce task Xmx is 200M, which
>> is much smaller than the standard jvm default 512M and most users
>> don't need to increase it. Please post the code reading the object (in
>> hdfs?) in your tasks.
>>
>>
>>>
>>> I found that instead of
>>> serialization the object, maybe I could create a MapFile as an index to
>>> permit lookups by key in Hadoop. I have also compared the performance of
>>> MongoDB and Memcache. I will let you know the result after I try the
>>> MapFile
>>> approach.
>>>
>>> Shi
>>>
>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>>>>
>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>> stored
>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>> setting
>>>>>>
>>>>>>
>>>>>
>>>>> -Xmx
>>>>>
>>>>>
>>>>>>
>>>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>>>> code
>>>>>>
>>>>>>
>>>>>
>>>>> is
>>>>>
>>>>>
>>>>>>
>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>
>>>>>>
>>>>>
>>>>> map/reduce
>>>>>
>>>>>
>>>>>>
>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get
>>>>>> the
>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
>>>>>>
>>>>>>
>>>>>
>>>>> little
>>>>>
>>>>>
>>>>>>
>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>> much
>>>>>> memory? If a program requires 1G memory on a single node, how much
>>>>>>
>>>>>>
>>>>>
>>>>> memory
>>>>>
>>>>>
>>>>>>
>>>>>> it requires (generally) in Hadoop?
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> The JVM reserves swap space in advance, at the time of launching the
>>>> process. If your swap is too low (or do not have any swap configured),
>>>> you
>>>> will hit this.
>>>>
>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>>> JVM.
>>>>
>>>> -Srivas.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Shi
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>
>
> --
> Postdoctoral Scholar
> Institute for Genomics and Systems Biology
> Department of Medicine, the University of Chicago
> Knapp Center for Biomedical Discovery
> 900 E. 57th St. Room 10148
> Chicago, IL 60637, US
> Tel: 773-702-6799
>
>
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
Here is my code. There is no Map/Reduce in it. I could run this code
using java -Xmx1000m , however, when using bin/hadoop -D
mapred.child.java.opts=-Xmx3000M it has heap space not enough error.
I have tried other program in Hadoop with the same settings so the
memory is available in my machines.
public static void main(String[] args) {
try{
String myFile = "xxx.dat";
FileInputStream fin = new FileInputStream(myFile);
ois = new ObjectInputStream(fin);
margintagMap = ois.readObject();
ois.close();
fin.close();
}catch(Exception e){
//
}
}
On 2010-10-13 13:30, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu> wrote:
>
>> As a coming-up to the my own question, I think to invoke the JVM in Hadoop
>> requires much more memory than an ordinary JVM.
>>
> That's simply not true. The default mapreduce task Xmx is 200M, which
> is much smaller than the standard jvm default 512M and most users
> don't need to increase it. Please post the code reading the object (in
> hdfs?) in your tasks.
>
>
>> I found that instead of
>> serialization the object, maybe I could create a MapFile as an index to
>> permit lookups by key in Hadoop. I have also compared the performance of
>> MongoDB and Memcache. I will let you know the result after I try the MapFile
>> approach.
>>
>> Shi
>>
>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>
>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>>>
>>>>>
>>>> -Xmx
>>>>
>>>>
>>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>>> code
>>>>>
>>>>>
>>>> is
>>>>
>>>>
>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>
>>>>>
>>>> map/reduce
>>>>
>>>>
>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the
>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
>>>>>
>>>>>
>>>> little
>>>>
>>>>
>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>>>> memory? If a program requires 1G memory on a single node, how much
>>>>>
>>>>>
>>>> memory
>>>>
>>>>
>>>>> it requires (generally) in Hadoop?
>>>>>
>>>>>
>>>>
>>>>
>>> The JVM reserves swap space in advance, at the time of launching the
>>> process. If your swap is too low (or do not have any swap configured), you
>>> will hit this.
>>>
>>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>>> JVM.
>>>
>>> -Srivas.
>>>
>>>
>>>
>>>
>>>
>>>>> Thanks.
>>>>>
>>>>> Shi
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799
Re: load a serialized object in hadoop
Posted by Luke Lu <ll...@vicaya.com>.
On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu <sh...@uchicago.edu> wrote:
> As a coming-up to the my own question, I think to invoke the JVM in Hadoop
> requires much more memory than an ordinary JVM.
That's simply not true. The default mapreduce task Xmx is 200M, which
is much smaller than the standard jvm default 512M and most users
don't need to increase it. Please post the code reading the object (in
hdfs?) in your tasks.
> I found that instead of
> serialization the object, maybe I could create a MapFile as an index to
> permit lookups by key in Hadoop. I have also compared the performance of
> MongoDB and Memcache. I will let you know the result after I try the MapFile
> approach.
>
> Shi
>
> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>>
>>>
>>> -Xmx
>>>
>>>>
>>>> as 1000M. However, in hadoop I could never load it into memory. The
>>>> code
>>>>
>>>
>>> is
>>>
>>>>
>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>
>>>
>>> map/reduce
>>>
>>>>
>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the
>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
>>>>
>>>
>>> little
>>>
>>>>
>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>>> memory? If a program requires 1G memory on a single node, how much
>>>>
>>>
>>> memory
>>>
>>>>
>>>> it requires (generally) in Hadoop?
>>>>
>>>
>>>
>>
>> The JVM reserves swap space in advance, at the time of launching the
>> process. If your swap is too low (or do not have any swap configured), you
>> will hit this.
>>
>> Or, you are on a 32-bit machine, in which case 3G is not possible in the
>> JVM.
>>
>> -Srivas.
>>
>>
>>
>>
>>>>
>>>> Thanks.
>>>>
>>>> Shi
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
Re: load a serialized object in hadoop
Posted by Matt Pouttu-Clarke <Ma...@icrossing.com>.
Also, serialization often keeps previously read object references in
memory. Better to use Thrift or Avro to serialize the object.
In my experience serialization is inefficient for large object graphs,
but works fine for smaller graphs (depending on how much memory you
have to work with).
Also for that small of data memcache and mongo may be overkill (unless
the data changes frequently)
Cheers,
Matt
On Oct 13, 2010, at 11:04 AM, "Shi Yu" <sh...@uchicago.edu> wrote:
> As a coming-up to the my own question, I think to invoke the JVM in
> Hadoop requires much more memory than an ordinary JVM. I found that
> instead of serialization the object, maybe I could create a MapFile
> as an index to permit lookups by key in Hadoop. I have also compared
> the performance of MongoDB and Memcache. I will let you know the
> result after I try the MapFile approach.
>
> Shi
>
> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>> stored
>>>> object is 200M. I could read that object efficiently in JAVA by
>>>> setting
>>>>
>>> -Xmx
>>>
>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>> The code
>>>>
>>> is
>>>
>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>
>>> map/reduce
>>>
>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>> get the
>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>> explain a
>>>>
>>> little
>>>
>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>> so much
>>>> memory? If a program requires 1G memory on a single node, how much
>>>>
>>> memory
>>>
>>>> it requires (generally) in Hadoop?
>>>>
>>>
>> The JVM reserves swap space in advance, at the time of launching the
>> process. If your swap is too low (or do not have any swap
>> configured), you
>> will hit this.
>>
>> Or, you are on a 32-bit machine, in which case 3G is not possible
>> in the
>> JVM.
>>
>> -Srivas.
>>
>>
>>
>>
>>>> Thanks.
>>>>
>>>> Shi
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>
>>
>
>
iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: load a serialized object in hadoop
Posted by Shi Yu <sh...@uchicago.edu>.
As a coming-up to the my own question, I think to invoke the JVM in
Hadoop requires much more memory than an ordinary JVM. I found that
instead of serialization the object, maybe I could create a MapFile as
an index to permit lookups by key in Hadoop. I have also compared the
performance of MongoDB and Memcache. I will let you know the result
after I try the MapFile approach.
Shi
On 2010-10-12 21:59, M. C. Srivas wrote:
>>
>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu> wrote:
>>
>>
>>> Hi,
>>>
>>> I want to load a serialized HashMap object in hadoop. The file of stored
>>> object is 200M. I could read that object efficiently in JAVA by setting
>>>
>> -Xmx
>>
>>> as 1000M. However, in hadoop I could never load it into memory. The code
>>>
>> is
>>
>>> very simple (just read the ObjectInputStream) and there is yet no
>>>
>> map/reduce
>>
>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the
>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
>>>
>> little
>>
>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
>>> memory? If a program requires 1G memory on a single node, how much
>>>
>> memory
>>
>>> it requires (generally) in Hadoop?
>>>
>>
> The JVM reserves swap space in advance, at the time of launching the
> process. If your swap is too low (or do not have any swap configured), you
> will hit this.
>
> Or, you are on a 32-bit machine, in which case 3G is not possible in the
> JVM.
>
> -Srivas.
>
>
>
>
>>> Thanks.
>>>
>>> Shi
>>>
>>> --
>>>
>>>
>>>
>>
>
Re: load a serialized object in hadoop
Posted by "M. C. Srivas" <mc...@gmail.com>.
>
>
> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <sh...@uchicago.edu> wrote:
>
> > Hi,
> >
> > I want to load a serialized HashMap object in hadoop. The file of stored
> > object is 200M. I could read that object efficiently in JAVA by setting
> -Xmx
> > as 1000M. However, in hadoop I could never load it into memory. The code
> is
> > very simple (just read the ObjectInputStream) and there is yet no
> map/reduce
> > implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the
> > "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a
> little
> > bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
> > memory? If a program requires 1G memory on a single node, how much
> memory
> > it requires (generally) in Hadoop?
>
The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap configured), you
will hit this.
Or, you are on a 32-bit machine, in which case 3G is not possible in the
JVM.
-Srivas.
> >
> > Thanks.
> >
> > Shi
> >
> > --
> >
> >
>
Re: load a serialized object in hadoop
Posted by Charles Lee <li...@gmail.com>.
In 32 bit machine, the biggest memory the jvm can provide is in the range of
1.5g to 2.0g. So if you want a bigger memory, say 3000M, you should have a
64bit machine.
On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu <sh...@uchicago.edu> wrote:
> Hi,
>
> I want to load a serialized HashMap object in hadoop. The file of stored
> object is 200M. I could read that object efficiently in JAVA by setting -Xmx
> as 1000M. However, in hadoop I could never load it into memory. The code is
> very simple (just read the ObjectInputStream) and there is yet no map/reduce
> implemented. I set the mapred.child.java.opts=-Xmx3000M, still get the
> "java.lang.OutOfMemoryError: Java heap space" Could anyone explain a little
> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
> memory? If a program requires 1G memory on a single node, how much memory
> it requires (generally) in Hadoop?
>
> Thanks.
>
> Shi
>
> --
>
>
--
Yours sincerely,
Charles Lee