You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brensch <as...@sub.uni-goettingen.de> on 2008/03/26 09:57:22 UTC

Howto?: Monitor File/Job allocation

Hello everybody,

I've been playing with Hadoop for a few days, and I'm only starting to
explore it's beauty.

In an attempt to learn on the Grep Example, I ended up wondering whether you
can actually extract from within a map, on which file you are currently
running. 
e.g. Suppose I want to grep through a set of files, and instead of having
only a global response, I need an output per file as well.

> ./bin/hadoop jar hadoop-0.16.1-examples.jar grep input output "au[a-c]"

> input/file1.txt 3 aua
> input/file1.txt 2 aub
> input/file1.txt 1 auc

> input/file2.txt 1 aua
> input/file2.txt 2 aub
> input/file2.txt 3 auc

> 4 aua
> 4 aub
> 4 auc


now this could be really easy to do (just hit the right variable in the
JobConf?) or it could be absolutely impossible, since its hadoop's innate
goal to extract from file-related stuff - I'd really appreciate a hint or a
link to read about this.

regards,
Brensch
-- 
View this message in context: http://www.nabble.com/Howto-%3A-Monitor-File-Job-allocation-tp16297900p16297900.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Howto?: Monitor File/Job allocation

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
>From here:

http://wiki.apache.org/hadoop/TaskExecutionEnvironment

The following properties are localized for each task's JobConf:

*Name*

*Type*

*Description*

mapred.job.id

String

The job id

mapred.task.id

String

The task id

mapred.task.is.map

boolean

Is this a map task

mapred.task.partition

int

The id of the task within the job

map.input.file

String

The filename that the map is reading from

map.input.start

long

The offset of the start of the map input split

map.input.length

long

The number of bytes in the map input split


On 26/03/2008, Brensch <as...@sub.uni-goettingen.de> wrote:
>
>
> Hello everybody,
>
> I've been playing with Hadoop for a few days, and I'm only starting to
> explore it's beauty.
>
> In an attempt to learn on the Grep Example, I ended up wondering whether
> you
> can actually extract from within a map, on which file you are currently
> running.
> e.g. Suppose I want to grep through a set of files, and instead of having
> only a global response, I need an output per file as well.
>
> > ./bin/hadoop jar hadoop-0.16.1-examples.jar grep input output "au[a-c]"
>
> > input/file1.txt 3 aua
> > input/file1.txt 2 aub
> > input/file1.txt 1 auc
>
> > input/file2.txt 1 aua
> > input/file2.txt 2 aub
> > input/file2.txt 3 auc
>
> > 4 aua
> > 4 aub
> > 4 auc
>
>
> now this could be really easy to do (just hit the right variable in the
> JobConf?) or it could be absolutely impossible, since its hadoop's innate
> goal to extract from file-related stuff - I'd really appreciate a hint or
> a
> link to read about this.
>
> regards,
> Brensch
>
> --
> View this message in context:
> http://www.nabble.com/Howto-%3A-Monitor-File-Job-allocation-tp16297900p16297900.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.