You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "nguyenhuynh.mr" <ng...@gmail.com> on 2009/04/29 04:47:24 UTC

How to write large string to file in HDFS

Hi all!


I have the large String and I want to write it into the file in HDFS.

(The large string has >100.000 lines.)


Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
But the copyBytes request the InputStream of content. Therefore, I have
to convert the String to InputStream, some things like:

   

    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());   

    The "sb" is a StringBuffer.


It not work with the command line above. :(

There is the error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at asnet.haris.mapred.jobs.Test.main(Test.java:32)



Please give me the good solution!


Thanks,


Best regards,

Nguyen,




Re: How to write large string to file in HDFS

Posted by "nguyenhuynh.mr" <ng...@gmail.com>.
Wang Zhong wrote:

> You can try using FSDataOutputStream in reduce phase. Create a file
> with FSDataOutputStream by the method below:
>
> ====
> FileSystem fs = FileSystem.get(conf);
> OutputStream os = fs.create(path);
> os.writeChars(str);
> ====
>
> You should call writeChars in each iteration of your values but not
> use a StringBuffer. The key should be part of your file name to
> indicate the group of URIs.
>
>
> On Wed, Apr 29, 2009 at 2:56 PM, nguyenhuynh.mr
> <ng...@gmail.com> wrote:
>   
>> Wang Zhong wrote:
>>
>>     
>>> Where did you get the large string? Can't you generate the string one
>>> line per time and append it to local files, then upload to HDFS when
>>> finished?
>>>
>>> On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
>>> <ng...@gmail.com> wrote:
>>>
>>>       
>>>> Hi all!
>>>>
>>>>
>>>> I have the large String and I want to write it into the file in HDFS.
>>>>
>>>> (The large string has >100.000 lines.)
>>>>
>>>>
>>>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>>>> But the copyBytes request the InputStream of content. Therefore, I have
>>>> to convert the String to InputStream, some things like:
>>>>
>>>>
>>>>
>>>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>>>
>>>>    The "sb" is a StringBuffer.
>>>>
>>>>
>>>> It not work with the command line above. :(
>>>>
>>>> There is the error:
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>>>    at java.lang.String.getBytes(String.java:947)
>>>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>>>
>>>>
>>>>
>>>> Please give me the good solution!
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Nguyen,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>> Thanks for your answer!
>>
>> I have Map/Reduce job. It partition URI from HBase into groups URIs.
>> In the map phase, get group name of the URI and collect output
>> <groupname, uri>.
>> In the reduce phase, I get the String (URIs of the partition) and save
>> into HDFS.
>> Each group is a file.
>>
>> Thanks,
>>
>> Best regards,
>> NguyenHuynh.
>>
>>
>>     
>
>
>
>   
Thanks very much!

Best,
Nguyen.

Re: How to write large string to file in HDFS

Posted by Wang Zhong <wa...@gmail.com>.
You can try using FSDataOutputStream in reduce phase. Create a file
with FSDataOutputStream by the method below:

====
FileSystem fs = FileSystem.get(conf);
OutputStream os = fs.create(path);
os.writeChars(str);
====

You should call writeChars in each iteration of your values but not
use a StringBuffer. The key should be part of your file name to
indicate the group of URIs.


On Wed, Apr 29, 2009 at 2:56 PM, nguyenhuynh.mr
<ng...@gmail.com> wrote:
> Wang Zhong wrote:
>
>> Where did you get the large string? Can't you generate the string one
>> line per time and append it to local files, then upload to HDFS when
>> finished?
>>
>> On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
>> <ng...@gmail.com> wrote:
>>
>>> Hi all!
>>>
>>>
>>> I have the large String and I want to write it into the file in HDFS.
>>>
>>> (The large string has >100.000 lines.)
>>>
>>>
>>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>>> But the copyBytes request the InputStream of content. Therefore, I have
>>> to convert the String to InputStream, some things like:
>>>
>>>
>>>
>>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>>
>>>    The "sb" is a StringBuffer.
>>>
>>>
>>> It not work with the command line above. :(
>>>
>>> There is the error:
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>>    at java.lang.String.getBytes(String.java:947)
>>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>>
>>>
>>>
>>> Please give me the good solution!
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Best regards,
>>>
>>> Nguyen,
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
> Thanks for your answer!
>
> I have Map/Reduce job. It partition URI from HBase into groups URIs.
> In the map phase, get group name of the URI and collect output
> <groupname, uri>.
> In the reduce phase, I get the String (URIs of the partition) and save
> into HDFS.
> Each group is a file.
>
> Thanks,
>
> Best regards,
> NguyenHuynh.
>
>



-- 
Wang Zhong

Re: How to write large string to file in HDFS

Posted by "nguyenhuynh.mr" <ng...@gmail.com>.
Wang Zhong wrote:

> Where did you get the large string? Can't you generate the string one
> line per time and append it to local files, then upload to HDFS when
> finished?
>
> On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
> <ng...@gmail.com> wrote:
>   
>> Hi all!
>>
>>
>> I have the large String and I want to write it into the file in HDFS.
>>
>> (The large string has >100.000 lines.)
>>
>>
>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>> But the copyBytes request the InputStream of content. Therefore, I have
>> to convert the String to InputStream, some things like:
>>
>>
>>
>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>
>>    The "sb" is a StringBuffer.
>>
>>
>> It not work with the command line above. :(
>>
>> There is the error:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>    at java.lang.String.getBytes(String.java:947)
>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>
>>
>>
>> Please give me the good solution!
>>
>>
>> Thanks,
>>
>>
>> Best regards,
>>
>> Nguyen,
>>
>>
>>
>>
>>     
>
>
>
>   
Thanks for your answer!

I have Map/Reduce job. It partition URI from HBase into groups URIs.
In the map phase, get group name of the URI and collect output
<groupname, uri>.
In the reduce phase, I get the String (URIs of the partition) and save
into HDFS.
Each group is a file.

Thanks,

Best regards,
NguyenHuynh.


Re: How to write large string to file in HDFS

Posted by Wang Zhong <wa...@gmail.com>.
Where did you get the large string? Can't you generate the string one
line per time and append it to local files, then upload to HDFS when
finished?

On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
<ng...@gmail.com> wrote:
> Hi all!
>
>
> I have the large String and I want to write it into the file in HDFS.
>
> (The large string has >100.000 lines.)
>
>
> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
> But the copyBytes request the InputStream of content. Therefore, I have
> to convert the String to InputStream, some things like:
>
>
>
>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>
>    The "sb" is a StringBuffer.
>
>
> It not work with the command line above. :(
>
> There is the error:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>    at java.lang.StringCoding.encode(StringCoding.java:272)
>    at java.lang.String.getBytes(String.java:947)
>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>
>
>
> Please give me the good solution!
>
>
> Thanks,
>
>
> Best regards,
>
> Nguyen,
>
>
>
>



-- 
Wang Zhong

Re: How to write large string to file in HDFS

Posted by "nguyenhuynh.mr" <ng...@gmail.com>.
jason hadoop wrote:

> How about new InputStreamReader( new StringReader( String ), "UTF-8" )
> replace UTF-8 with an appropriate charset.
>
>
> On Tue, Apr 28, 2009 at 7:47 PM, nguyenhuynh.mr <ng...@gmail.com>wrote:
>
>   
>> Hi all!
>>
>>
>> I have the large String and I want to write it into the file in HDFS.
>>
>> (The large string has >100.000 lines.)
>>
>>
>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>> But the copyBytes request the InputStream of content. Therefore, I have
>> to convert the String to InputStream, some things like:
>>
>>
>>
>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>
>>    The "sb" is a StringBuffer.
>>
>>
>> It not work with the command line above. :(
>>
>> There is the error:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>    at java.lang.String.getBytes(String.java:947)
>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>
>>
>>
>> Please give me the good solution!
>>
>>
>> Thanks,
>>
>>
>> Best regards,
>>
>> Nguyen,
>>
>>
>>
>>
>>     
>
>
>   
Thanks for your answer!

Best,
Nguyen,

Re: How to write large string to file in HDFS

Posted by jason hadoop <ja...@gmail.com>.
How about new InputStreamReader( new StringReader( String ), "UTF-8" )
replace UTF-8 with an appropriate charset.


On Tue, Apr 28, 2009 at 7:47 PM, nguyenhuynh.mr <ng...@gmail.com>wrote:

> Hi all!
>
>
> I have the large String and I want to write it into the file in HDFS.
>
> (The large string has >100.000 lines.)
>
>
> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
> But the copyBytes request the InputStream of content. Therefore, I have
> to convert the String to InputStream, some things like:
>
>
>
>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>
>    The "sb" is a StringBuffer.
>
>
> It not work with the command line above. :(
>
> There is the error:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>    at java.lang.StringCoding.encode(StringCoding.java:272)
>    at java.lang.String.getBytes(String.java:947)
>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>
>
>
> Please give me the good solution!
>
>
> Thanks,
>
>
> Best regards,
>
> Nguyen,
>
>
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422