You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/05/31 19:08:10 UTC

How to see the specific file assigned to a mapper?

Context: I have a bunch of files living in HDFS, and I think my jobs are
failing on one of them... I want to output the files that the job is failing
on.

I thought that I could just make my own LoadFunc that followed the same
methodology as PigStorage, but caught exceptions and logged the file that
was given...this isn't working, however. I tried returning loadLocation, but
that is the globbed input, not the input to the mapper. I also tried reading
mapreduce.map.file.input and map.file.input from the Job given to
setLocation, but both were null... I think this is where some of my
ignorance as to pig's internal workings is coming into play, as I'm not sure
when files are deglobbed and the splits are actually read. I tried using
getLocations() from the PigSplit passed to prepareToRead but that was just
the glob as well...

My next thought would be to read make a RecordReader that outputs the file
associated with its splits (as I assume that this should have to have the
specific files it is processing?), but I thought I'd ask if there was a
cleaner way before doing that...

Thanks!
Jon

Re: How to see the specific file assigned to a mapper?

Posted by Jonathan Coveney <jc...@gmail.com>.
Thanks Xiaomeng!

2011/5/31 Xiaomeng Wan <sh...@gmail.com>

> I asked a similar question before. Please see this thread
>
>
> http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%3CAANLkTimqkjAZfSTyW8u6S5Mi29a+=5u=ayVMuvoykacx@mail.gmail.com%3E
>
> Shawn
>
> On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney <jc...@gmail.com>
> wrote:
> > Context: I have a bunch of files living in HDFS, and I think my jobs are
> > failing on one of them... I want to output the files that the job is
> failing
> > on.
> >
> > I thought that I could just make my own LoadFunc that followed the same
> > methodology as PigStorage, but caught exceptions and logged the file that
> > was given...this isn't working, however. I tried returning loadLocation,
> but
> > that is the globbed input, not the input to the mapper. I also tried
> reading
> > mapreduce.map.file.input and map.file.input from the Job given to
> > setLocation, but both were null... I think this is where some of my
> > ignorance as to pig's internal workings is coming into play, as I'm not
> sure
> > when files are deglobbed and the splits are actually read. I tried using
> > getLocations() from the PigSplit passed to prepareToRead but that was
> just
> > the glob as well...
> >
> > My next thought would be to read make a RecordReader that outputs the
> file
> > associated with its splits (as I assume that this should have to have the
> > specific files it is processing?), but I thought I'd ask if there was a
> > cleaner way before doing that...
> >
> > Thanks!
> > Jon
> >
>

Re: How to see the specific file assigned to a mapper?

Posted by Xiaomeng Wan <sh...@gmail.com>.
I asked a similar question before. Please see this thread

http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%3CAANLkTimqkjAZfSTyW8u6S5Mi29a+=5u=ayVMuvoykacx@mail.gmail.com%3E

Shawn

On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney <jc...@gmail.com> wrote:
> Context: I have a bunch of files living in HDFS, and I think my jobs are
> failing on one of them... I want to output the files that the job is failing
> on.
>
> I thought that I could just make my own LoadFunc that followed the same
> methodology as PigStorage, but caught exceptions and logged the file that
> was given...this isn't working, however. I tried returning loadLocation, but
> that is the globbed input, not the input to the mapper. I also tried reading
> mapreduce.map.file.input and map.file.input from the Job given to
> setLocation, but both were null... I think this is where some of my
> ignorance as to pig's internal workings is coming into play, as I'm not sure
> when files are deglobbed and the splits are actually read. I tried using
> getLocations() from the PigSplit passed to prepareToRead but that was just
> the glob as well...
>
> My next thought would be to read make a RecordReader that outputs the file
> associated with its splits (as I assume that this should have to have the
> specific files it is processing?), but I thought I'd ask if there was a
> cleaner way before doing that...
>
> Thanks!
> Jon
>