You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Stack <st...@duboce.net> on 2011/07/01 08:02:13 UTC

Re: Task attempt_201106301204_0007_m_000001_0 failed to report status for 600 seconds. Killing!

On Thu, Jun 30, 2011 at 8:07 AM, Shuja Rehman <sh...@gmail.com> wrote:
> I am doing  bulk insertion into Hbase using Map reduce reading from lot of
> small(10MB approximation) files, resulting mappers = no of files. I am also
> monitoring the performance using Ganglia. The machines are c1.xlarge for
> processing the files(task trackers+data nodes) and m1.xlarge for hbase
> cluster(region servers+data nodes). The CPU usage remain 75%-100% for almost
> all of the servers. The ram usage also below 5 GB. But the job fails due to
> killing of lot of maps. If i run the same job without insertion then
> processing complete in 9-10 minutes. So the question is why it is  killing
> so many maps? Any clue?
>

Can you figure which region the map tasks are failing against?  And
once you have the region, what was the regionserver that was hosting
this region (grep master log to figure this).  Thereafter, check the
RS logs around the time of the map task timeout.  See anything?  Long
GC?  A region split?  600 seconds is a long time for the server-side
to be hung up.

What version of hbase?

Thanks,
St.Ack