You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Huanchen Zhang <ia...@gmail.com> on 2012/10/05 00:03:33 UTC

Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

Hello,

I have a question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file.

Currently, about three mapper takes about five times more time to complete. So, how can I detect which specific files are those three mapper are processing? If above if doable, how can I assign more mappers to process those specific files?

Thank you !

Best,
Huanchen

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all the
tasks and under the status column tells you which part of the input is
being processed. Please note that, depending on the input format chosen, a
task may be processing a *part* of a file, and not necessary a file itself.

Another good source of information to see why these particular tasks are
slow will be to look at the job's counters. Again these counters can be
accessed from the web ui of the task list page.

It would help more if you can provide more information - like what job
you're trying to run, the input format specified etc.

Thanks
hemanth

On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a question about how to find which file takes the longest time to
> process and how to assign more mappers to process that particular file.
>
> Currently, about three mapper takes about five times more time to
> complete. So, how can I detect which specific files are those three mapper
> are processing? If above if doable, how can I assign more mappers to
> process those specific files?
>
> Thank you !
>
> Best,
> Huanchen

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all the
tasks and under the status column tells you which part of the input is
being processed. Please note that, depending on the input format chosen, a
task may be processing a *part* of a file, and not necessary a file itself.

Another good source of information to see why these particular tasks are
slow will be to look at the job's counters. Again these counters can be
accessed from the web ui of the task list page.

It would help more if you can provide more information - like what job
you're trying to run, the input format specified etc.

Thanks
hemanth

On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a question about how to find which file takes the longest time to
> process and how to assign more mappers to process that particular file.
>
> Currently, about three mapper takes about five times more time to
> complete. So, how can I detect which specific files are those three mapper
> are processing? If above if doable, how can I assign more mappers to
> process those specific files?
>
> Thank you !
>
> Best,
> Huanchen

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all the
tasks and under the status column tells you which part of the input is
being processed. Please note that, depending on the input format chosen, a
task may be processing a *part* of a file, and not necessary a file itself.

Another good source of information to see why these particular tasks are
slow will be to look at the job's counters. Again these counters can be
accessed from the web ui of the task list page.

It would help more if you can provide more information - like what job
you're trying to run, the input format specified etc.

Thanks
hemanth

On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a question about how to find which file takes the longest time to
> process and how to assign more mappers to process that particular file.
>
> Currently, about three mapper takes about five times more time to
> complete. So, how can I detect which specific files are those three mapper
> are processing? If above if doable, how can I assign more mappers to
> process those specific files?
>
> Thank you !
>
> Best,
> Huanchen

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all the
tasks and under the status column tells you which part of the input is
being processed. Please note that, depending on the input format chosen, a
task may be processing a *part* of a file, and not necessary a file itself.

Another good source of information to see why these particular tasks are
slow will be to look at the job's counters. Again these counters can be
accessed from the web ui of the task list page.

It would help more if you can provide more information - like what job
you're trying to run, the input format specified etc.

Thanks
hemanth

On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a question about how to find which file takes the longest time to
> process and how to assign more mappers to process that particular file.
>
> Currently, about three mapper takes about five times more time to
> complete. So, how can I detect which specific files are those three mapper
> are processing? If above if doable, how can I assign more mappers to
> process those specific files?
>
> Thank you !
>
> Best,
> Huanchen