You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Xie, Tao" <xi...@gmail.com> on 2009/04/27 11:22:03 UTC

write a large file to HDFS?

hi, 
If I write a large file to HDFS, will it be split into blocks and
multi-blocks are written to HDFS at the same time? Or HDFS can only write
block by block? 
Thanks.
-- 
View this message in context: http://www.nabble.com/write-a-large-file-to-HDFS--tp23252754p23252754.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: What's the local heap size of Hadoop? How to increase it?

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.

Yeah, it works!
Bhupesh and Jason, thanks a lot!

Jasmine
----- Original Message ----- 
From: "jason hadoop" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Wednesday, April 29, 2009 11:19 PM
Subject: Re: What's the local heap size of Hadoop? How to increase it?


> you can also specify it on the command line of your hadoop job
> hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M other
> arguments
> Note: there is a space between the -D and mapred, and the -D has to come
> after the main class specification.
>
> This parameter may also be specified  via
> conf.set("mapred.child.java.opts","-Xmx800m"); before submitting the job.
>
> On Wed, Apr 29, 2009 at 2:59 PM, Bhupesh Bansal 
> <bb...@linkedin.com>wrote:
>
>> Hey ,
>>
>> Try adding
>>
>>  <property>
>>    <name>mapred.child.java.opts</name>
>>    <value>-Xmx800M -server</value>
>>  </property>
>>
>> With the right JVM size in your hadoop-site.xml , you will have to copy
>> this
>> to all mapred nodes and restart the cluster.
>>
>> Best
>> Bhupesh
>>
>>
>>
>> On 4/29/09 2:03 PM, "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>
>> wrote:
>>
>> > Hi, there,
>> >
>> > What's the local heap size of Hadoop? I have tried to load a local 
>> > cache
>> > file which is composed of 500,000 short phrase, but the task failed. 
>> > The
>> > output of Hadoop looks like(com.aliasi.dict.ExactDictionaryChunker is a
>> > third-party jar package, and the records is organized as a trie
>> struction):
>> >
>> > java.lang.OutOfMemoryError: Java heap space
>> >         at java.util.HashMap.addEntry(HashMap.java:753)
>> >         at java.util.HashMap.put(HashMap.java:385)
>> >         at
>> > com.aliasi.dict.ExactDictionaryChunker$TrieNode.getOrCreateDaughter(Ex
>> > actDictionaryChunker.java:476)
>> >         at
>> > com.aliasi.dict.ExactDictionaryChunker$TrieNode.add(ExactDictionaryChu
>> > nker.java:484)
>> >
>> > When I reduce the total record number to 30,000. My mapreduce job 
>> > became
>> > succeed. So, I have a question, What's the local heap size of Hadoop's
>> Java
>> > Virtual Machine? How to increase it?
>> >
>> > Best,
>> > Jasmine
>> >
>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: What's the local heap size of Hadoop? How to increase it?

Posted by jason hadoop <ja...@gmail.com>.

you can also specify it on the command line of your hadoop job
hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M other
arguments
Note: there is a space between the -D and mapred, and the -D has to come
after the main class specification.

This parameter may also be specified  via
conf.set("mapred.child.java.opts","-Xmx800m"); before submitting the job.

On Wed, Apr 29, 2009 at 2:59 PM, Bhupesh Bansal <bb...@linkedin.com>wrote:

> Hey ,
>
> Try adding
>
>  <property>
>    <name>mapred.child.java.opts</name>
>    <value>-Xmx800M -server</value>
>  </property>
>
> With the right JVM size in your hadoop-site.xml , you will have to copy
> this
> to all mapred nodes and restart the cluster.
>
> Best
> Bhupesh
>
>
>
> On 4/29/09 2:03 PM, "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>
> wrote:
>
> > Hi, there,
> >
> > What's the local heap size of Hadoop? I have tried to load a local cache
> > file which is composed of 500,000 short phrase, but the task failed. The
> > output of Hadoop looks like(com.aliasi.dict.ExactDictionaryChunker is a
> > third-party jar package, and the records is organized as a trie
> struction):
> >
> > java.lang.OutOfMemoryError: Java heap space
> >         at java.util.HashMap.addEntry(HashMap.java:753)
> >         at java.util.HashMap.put(HashMap.java:385)
> >         at
> > com.aliasi.dict.ExactDictionaryChunker$TrieNode.getOrCreateDaughter(Ex
> > actDictionaryChunker.java:476)
> >         at
> > com.aliasi.dict.ExactDictionaryChunker$TrieNode.add(ExactDictionaryChu
> > nker.java:484)
> >
> > When I reduce the total record number to 30,000. My mapreduce job became
> > succeed. So, I have a question, What's the local heap size of Hadoop's
> Java
> > Virtual Machine? How to increase it?
> >
> > Best,
> > Jasmine
> >
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: What's the local heap size of Hadoop? How to increase it?

Posted by Bhupesh Bansal <bb...@linkedin.com>.

Hey , 

Try adding 

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx800M -server</value>
  </property>
 
With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.

Best
Bhupesh



On 4/29/09 2:03 PM, "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu> wrote:

> Hi, there,
> 
> What's the local heap size of Hadoop? I have tried to load a local cache
> file which is composed of 500,000 short phrase, but the task failed. The
> output of Hadoop looks like(com.aliasi.dict.ExactDictionaryChunker is a
> third-party jar package, and the records is organized as a trie struction):
> 
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.addEntry(HashMap.java:753)
>         at java.util.HashMap.put(HashMap.java:385)
>         at 
> com.aliasi.dict.ExactDictionaryChunker$TrieNode.getOrCreateDaughter(Ex
> actDictionaryChunker.java:476)
>         at 
> com.aliasi.dict.ExactDictionaryChunker$TrieNode.add(ExactDictionaryChu
> nker.java:484)
> 
> When I reduce the total record number to 30,000. My mapreduce job became
> succeed. So, I have a question, What's the local heap size of Hadoop's Java
> Virtual Machine? How to increase it?
> 
> Best,
> Jasmine 
>

What's the local heap size of Hadoop? How to increase it?

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.

Hi, there,

What's the local heap size of Hadoop? I have tried to load a local cache 
file which is composed of 500,000 short phrase, but the task failed. The 
output of Hadoop looks like(com.aliasi.dict.ExactDictionaryChunker is a 
third-party jar package, and the records is organized as a trie struction):

java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.addEntry(HashMap.java:753)
        at java.util.HashMap.put(HashMap.java:385)
        at 
com.aliasi.dict.ExactDictionaryChunker$TrieNode.getOrCreateDaughter(Ex
actDictionaryChunker.java:476)
        at 
com.aliasi.dict.ExactDictionaryChunker$TrieNode.add(ExactDictionaryChu
nker.java:484)

When I reduce the total record number to 30,000. My mapreduce job became 
succeed. So, I have a question, What's the local heap size of Hadoop's Java 
Virtual Machine? How to increase it?

Best,
Jasmine

Re: write a large file to HDFS?

Posted by jason hadoop <ja...@gmail.com>.

block by block.
open multiple connections and write multiple files if you are not saturating
your network connection.
Generally a single file writer writing large blocks rapidly will do a decent
job of saturating things.

On Mon, Apr 27, 2009 at 2:22 AM, Xie, Tao <xi...@gmail.com> wrote:

>
> hi,
> If I write a large file to HDFS, will it be split into blocks and
> multi-blocks are written to HDFS at the same time? Or HDFS can only write
> block by block?
> Thanks.
> --
> View this message in context:
> http://www.nabble.com/write-a-large-file-to-HDFS--tp23252754p23252754.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422