You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Roy Smith <ro...@panix.com> on 2013/01/11 05:02:13 UTC

How to interpret the progress meter?

I'm running a job that looks like it's going to take about 12 hours on 4 EC2 instances.  I don't really understand the "complete" percentages reported by http://localhost:9100/jobtasks.jsp.  They are extremely non-linear.  For my reduce steps, they ramp up to 40-60% in just a few minutes, then take hours to slowly inch their way up the rest of the way to 100%.

What does the "complete" percentage really mean?

--
Roy Smith
roy@panix.com


Re: How to interpret the progress meter?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Smith,

            In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.

Best,
Mahesh Balija,
Calsoft Labs.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:

> I'm running a job that looks like it's going to take about 12 hours on 4
> EC2 instances.  I don't really understand the "complete" percentages
> reported by http://localhost:9100/jobtasks.jsp.  They are extremely
> non-linear.  For my reduce steps, they ramp up to 40-60% in just a few
> minutes, then take hours to slowly inch their way up the rest of the way to
> 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>
>

Re: How to interpret the progress meter?

Posted by Harsh J <ha...@cloudera.com>.
The map side percentage is as the map's record reader reports its
progress. The reduce side is divided into 3 phases of 33~% each ->
shuffle (fetch data), sort and finally user-code (reduce). It is
normal to see jumps between these values, depending on the work to be
done, etc.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:
> I'm running a job that looks like it's going to take about 12 hours on 4 EC2
> instances.  I don't really understand the "complete" percentages reported by
> http://localhost:9100/jobtasks.jsp.  They are extremely non-linear.  For my
> reduce steps, they ramp up to 40-60% in just a few minutes, then take hours
> to slowly inch their way up the rest of the way to 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>



-- 
Harsh J

Re: How to interpret the progress meter?

Posted by Harsh J <ha...@cloudera.com>.
The map side percentage is as the map's record reader reports its
progress. The reduce side is divided into 3 phases of 33~% each ->
shuffle (fetch data), sort and finally user-code (reduce). It is
normal to see jumps between these values, depending on the work to be
done, etc.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:
> I'm running a job that looks like it's going to take about 12 hours on 4 EC2
> instances.  I don't really understand the "complete" percentages reported by
> http://localhost:9100/jobtasks.jsp.  They are extremely non-linear.  For my
> reduce steps, they ramp up to 40-60% in just a few minutes, then take hours
> to slowly inch their way up the rest of the way to 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>



-- 
Harsh J

Re: How to interpret the progress meter?

Posted by Harsh J <ha...@cloudera.com>.
The map side percentage is as the map's record reader reports its
progress. The reduce side is divided into 3 phases of 33~% each ->
shuffle (fetch data), sort and finally user-code (reduce). It is
normal to see jumps between these values, depending on the work to be
done, etc.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:
> I'm running a job that looks like it's going to take about 12 hours on 4 EC2
> instances.  I don't really understand the "complete" percentages reported by
> http://localhost:9100/jobtasks.jsp.  They are extremely non-linear.  For my
> reduce steps, they ramp up to 40-60% in just a few minutes, then take hours
> to slowly inch their way up the rest of the way to 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>



-- 
Harsh J

Re: How to interpret the progress meter?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Smith,

            In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.

Best,
Mahesh Balija,
Calsoft Labs.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:

> I'm running a job that looks like it's going to take about 12 hours on 4
> EC2 instances.  I don't really understand the "complete" percentages
> reported by http://localhost:9100/jobtasks.jsp.  They are extremely
> non-linear.  For my reduce steps, they ramp up to 40-60% in just a few
> minutes, then take hours to slowly inch their way up the rest of the way to
> 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>
>

Re: How to interpret the progress meter?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Smith,

            In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.

Best,
Mahesh Balija,
Calsoft Labs.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:

> I'm running a job that looks like it's going to take about 12 hours on 4
> EC2 instances.  I don't really understand the "complete" percentages
> reported by http://localhost:9100/jobtasks.jsp.  They are extremely
> non-linear.  For my reduce steps, they ramp up to 40-60% in just a few
> minutes, then take hours to slowly inch their way up the rest of the way to
> 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>
>

Re: How to interpret the progress meter?

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Smith,

            In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.

Best,
Mahesh Balija,
Calsoft Labs.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:

> I'm running a job that looks like it's going to take about 12 hours on 4
> EC2 instances.  I don't really understand the "complete" percentages
> reported by http://localhost:9100/jobtasks.jsp.  They are extremely
> non-linear.  For my reduce steps, they ramp up to 40-60% in just a few
> minutes, then take hours to slowly inch their way up the rest of the way to
> 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>
>

Re: How to interpret the progress meter?

Posted by Harsh J <ha...@cloudera.com>.
The map side percentage is as the map's record reader reports its
progress. The reduce side is divided into 3 phases of 33~% each ->
shuffle (fetch data), sort and finally user-code (reduce). It is
normal to see jumps between these values, depending on the work to be
done, etc.

On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith <ro...@panix.com> wrote:
> I'm running a job that looks like it's going to take about 12 hours on 4 EC2
> instances.  I don't really understand the "complete" percentages reported by
> http://localhost:9100/jobtasks.jsp.  They are extremely non-linear.  For my
> reduce steps, they ramp up to 40-60% in just a few minutes, then take hours
> to slowly inch their way up the rest of the way to 100%.
>
> What does the "complete" percentage really mean?
>
> --
> Roy Smith
> roy@panix.com
>



-- 
Harsh J