You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ch...@students.iiit.ac.in on 2008/07/01 23:18:20 UTC

taking lot of time in doing map task after 5% completion

Hi,
   We are working on conversion of 1.6 million text data inputs into
images , for this we are using hadoop but we are having a problem like
it is performing 1% of this job in 4 minutes and 3%-4% in 1 hr ... and
it is taking lot of time when it is proceeding to 100% . Is there any
thing wrong in my hadoop setup or any other problem . Because it works
too fast when i give a input of 1000 or 5000 taking only 23 sec - 1 min
13sec . my created image size will be around 13-30 kilobytes

 Thank you,

Regards,
Charan.T.
Chaitanya.VV.



Re: taking lot of time in doing map task after 5% completion

Posted by ch...@students.iiit.ac.in.
> On 7/1/08 2:18 PM, "charan@students.iiit.ac.in"
> <ch...@students.iiit.ac.in>
> wrote:
>>    We are working on conversion of 1.6 million text data inputs into
>> images , for this we are using hadoop but we are having a problem like
>> it is performing 1% of this job in 4 minutes and 3%-4% in 1 hr ... and
>> it is taking lot of time when it is proceeding to 100% . Is there any
>> thing wrong in my hadoop setup or any other problem . Because it works
>> too fast when i give a input of 1000 or 5000 taking only 23 sec - 1 min
>> 13sec . my created image size will be around 13-30 kilobytes
>
>     It sounds as though you have lots and lots of really small files.
> HDFS
> doesn't perform well under those conditions and will typically send the
> name
> node java process into a garbage collection tail spin.  Try combining the
> data into bigger files.
>

  Thankyou Allen
          We are using 1 input file containing 1.5 million words and we are
creating image for each word in 1500 directories using  50 in level1 and
30 in level2 each directory having 1000 images in them
                        will there be any problem in  doing so ?




Re: taking lot of time in doing map task after 5% completion

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.
On 7/1/08 2:18 PM, "charan@students.iiit.ac.in" <ch...@students.iiit.ac.in>
wrote:
>    We are working on conversion of 1.6 million text data inputs into
> images , for this we are using hadoop but we are having a problem like
> it is performing 1% of this job in 4 minutes and 3%-4% in 1 hr ... and
> it is taking lot of time when it is proceeding to 100% . Is there any
> thing wrong in my hadoop setup or any other problem . Because it works
> too fast when i give a input of 1000 or 5000 taking only 23 sec - 1 min
> 13sec . my created image size will be around 13-30 kilobytes

    It sounds as though you have lots and lots of really small files.  HDFS
doesn't perform well under those conditions and will typically send the name
node java process into a garbage collection tail spin.  Try combining the
data into bigger files.