You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Michel Tourn <mi...@yahoo-inc.com> on 2006/02/10 01:01:24 UTC
MapRed status information
Can somebody comment on the feasibility of this?
Currently the MapRed status information looks like:
060208 223641 map 0% reduce 100%
060208 223641 Job complete: job_ityg9w
percentage complete for a given job.
This is nice but I would also like to see some absolute numbers:
060208 223641 map 50% (2456) reduce 100% (123)
060208 223641 Job complete: job_ityg9w
which tells me:
2456 inputs records is 50% of the Map job
123 output records is 100% of the Reduce job
In particular this is useful info when you get
your DFS paths or file wildcards wrong:
MapRed finds zero input files and happily processes
an empty job and completes with:
060208 223641 map 100% reduce 100%
..which is 100% of zero records.
Completing successfully for an empty job is the right behaviour.
But we need more a informative status.
This could be combined with work on:
ETA estimation for the Job:
060208 223641 map 50% (2456) ETA: +3h20m 070208 015641
..
070208 010000 Job complete: job_ityg9w
Can somebody comment on the feasibility of this?
1. is it easy to get absolute number of records processed?
2. is it easy to get (an estimate(*) of) total number of records
to be processed in the job.
3. Where should this be computed?
Should it be computed by a client polling for status?
Can the information also be made availalbe to the job.tracker.info Web UI
?
(*)extrapolation based on file fragments size and bytes/rec so far.
Thanks,
Michel