You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by 丛林 <co...@gmail.com> on 2011/05/12 01:48:38 UTC

How to create a SequenceFile more faster?

Hi, all.

I want to write lots of little files (32GB) to HDFS as
org.apache.hadoop.io.SequenceFile.

But now it is too slow: we use about 8 hours to create this
SequenceFile (6.7GB).

So I wonder how to create this SequenceFile more faster?

Thanks for your suggestion.

-Best Wishes,

-Lin

Re: How to create a SequenceFile more faster?

Posted by Steve Lewis <lo...@gmail.com>.

Even for a single machine (and there may be reasons to use a single machine
if the original data is not splittable) Our experience suggests it should
take about an hour to process 32 GB on a single machine leading me to wonder
whether writing the Sequence file is your limiting step - Consider very
simple job which writes 32 GB of random data - say a Long count and a random
double to a Sequence file and run it on one box (you might also try the same
steps without the write) and see if you are really being limited by the
write.
  You might also consider compression while writing the sequence file

2011/5/12 丛林 <co...@gmail.com>

> Dear Harsh,
>
> Will you please explain how to create a sequence file in the way of
> mapreduce?
>
> Suppose that all 32G little file stored in one PC.
>
> Thanks for your suggestion.
>
> BTW: I notice that you repeated most of the topic of sequence file in
> this mail-list :-)
>
> Best Wishes,
>
> -Lin
>
>
> 2011/5/12 Harsh J <ha...@cloudera.com>:
> > Are you doing this as a MapReduce job or is it a simple linear
> > program? MapReduce could be much faster (Combined-files input format,
> > with a few Reducers for merging if you need that as well).
> >
> > On Thu, May 12, 2011 at 5:18 AM, 丛林 <co...@gmail.com> wrote:
> >> Hi, all.
> >>
> >> I want to write lots of little files (32GB) to HDFS as
> >> org.apache.hadoop.io.SequenceFile.
> >>
> >> But now it is too slow: we use about 8 hours to create this
> >> SequenceFile (6.7GB).
> >>
> >> So I wonder how to create this SequenceFile more faster?
> >>
> >> Thanks for your suggestion.
> >>
> >> -Best Wishes,
> >>
> >> -Lin
> >>
> >
> >
> >
> > --
> > Harsh J
> >
>

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: How to create a SequenceFile more faster?

Posted by 丛林 <co...@gmail.com>.

Dear Harsh,

Will you please explain how to create a sequence file in the way of mapreduce?

Suppose that all 32G little file stored in one PC.

Thanks for your suggestion.

BTW: I notice that you repeated most of the topic of sequence file in
this mail-list :-)

Best Wishes,

-Lin


2011/5/12 Harsh J <ha...@cloudera.com>:
> Are you doing this as a MapReduce job or is it a simple linear
> program? MapReduce could be much faster (Combined-files input format,
> with a few Reducers for merging if you need that as well).
>
> On Thu, May 12, 2011 at 5:18 AM, 丛林 <co...@gmail.com> wrote:
>> Hi, all.
>>
>> I want to write lots of little files (32GB) to HDFS as
>> org.apache.hadoop.io.SequenceFile.
>>
>> But now it is too slow: we use about 8 hours to create this
>> SequenceFile (6.7GB).
>>
>> So I wonder how to create this SequenceFile more faster?
>>
>> Thanks for your suggestion.
>>
>> -Best Wishes,
>>
>> -Lin
>>
>
>
>
> --
> Harsh J
>

Re: How to create a SequenceFile more faster?

Posted by Harsh J <ha...@cloudera.com>.

Are you doing this as a MapReduce job or is it a simple linear
program? MapReduce could be much faster (Combined-files input format,
with a few Reducers for merging if you need that as well).

On Thu, May 12, 2011 at 5:18 AM, 丛林 <co...@gmail.com> wrote:
> Hi, all.
>
> I want to write lots of little files (32GB) to HDFS as
> org.apache.hadoop.io.SequenceFile.
>
> But now it is too slow: we use about 8 hours to create this
> SequenceFile (6.7GB).
>
> So I wonder how to create this SequenceFile more faster?
>
> Thanks for your suggestion.
>
> -Best Wishes,
>
> -Lin
>

-- 
Harsh J