You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Liu, Raymond" <ra...@intel.com> on 2012/08/10 05:22:42 UTC
How can I get the intermediate output file from mapper class?
Hi
I am trying to access the intermediate file save to the local filesystem from mapreduce's mapper output.
I have googled this one : http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermediate-output
I am using hadoop 1.0.3 , and I did set following property in mapred-site.xml
<property>
<name>keep.task.files.pattern</name>
<value>.*_m_00000*</value>
</property>
Then after restart hadoop and run some jobss, I did see tasks in my local dir like:
/mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
But I still cannot find any output dir there.
I have four disks mount for local dir, and only jars,work dir are find as following:
<property>
<name>mapred.local.dir</name>
<value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/mapred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/mapred</value>
</property>
Then I search though them:
raymond@sr173:~$ ls /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
jars job.xml
raymond@sr173:~$ ls /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
raymond@sr173:~$ ls /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
jobToken work
raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
And I also search the ttprivate dir, no luck there :
raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
/mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
So, Is there anything I am still missing?
Best Regards,
Raymond Liu
RE: How can I get the intermediate output file from mapper class?
Posted by "Liu, Raymond" <ra...@intel.com>.
Alright, finally managed to get the intermediate file.
The pattern should be ".*_m_0000.*" instead of ".*_m_0000*"... stupid me.
If you try to get everything, use ".*" for pattern. ;)
Best Regards,
Raymond Liu
> -----Original Message-----
> From: Liu, Raymond [mailto:raymond.liu@intel.com]
> Sent: Friday, August 10, 2012 2:42 PM
> To: Harsh J; common-user@hadoop.apache.org
> Subject: RE: How can I get the intermediate output file from mapper class?
>
> Hi Harsh
>
> Thanks for your reply. While I don't quite catch what do you mean...
> Accroding to the description
>
> <property>
> <name>keep.task.files.pattern</name>
> <value>.*_m_0000*</value>
> <description>Keep all files from tasks whose task names match the given
> regular expression. Defaults to none.</description>
> </property>
>
>
> Isn't that pattern for the task name? and the task name is something like :
> task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the
> data from the tasks from been cleaned?
>
> If this don't work, can you kindly show me what's the exact pattern I
> should put here for the map->intermediate->reduce intermediate file (the
> merged partition file waiting to be shuffled to reduce tasks)? I tried ".out*" , it
> doesn't works too.
>
> Or I should modify some other property instead?
>
>
> Best Regards,
> Raymond Liu
>
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Friday, August 10, 2012 12:29 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: How can I get the intermediate output file from mapper class?
> >
> > Hi,
> >
> > You need the "file.out" and "file.out.index" files when wanting the
> > map->intermediate->reduce files. So try a pattern that matches these
> > and you should have it.
> >
> > The "XXXXX" kind of files are what MR produces on HDFS as regular
> > outputs - these aren't intermediate.
> >
> > On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <ra...@intel.com>
> > wrote:
> > > Hi
> > >
> > > I am trying to access the intermediate file save to the
> > > local
> > filesystem from mapreduce's mapper output.
> > >
> > > I have googled this one :
> > > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermed
> > > ia
> > > te-output
> > >
> > > I am using hadoop 1.0.3 , and I did set following property
> > > in mapred-site.xml
> > >
> > > <property>
> > > <name>keep.task.files.pattern</name>
> > > <value>.*_m_00000*</value>
> > > </property>
> > >
> > > Then after restart hadoop and run some jobss, I did see tasks in my
> > > local dir
> > like:
> > >
> > >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > >
> > > But I still cannot find any output dir there.
> > >
> > > I have four disks mount for local dir, and only jars,work dir are
> > > find as
> > following:
> > >
> > > <property>
> > > <name>mapred.local.dir</name>
> > >
> >
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/
> > ma
> > >
> >
> pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma
> > pr
> > > ed</value>
> > > </property>
> > >
> > > Then I search though them:
> > >
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > > jars job.xml
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/ raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > > jobToken work
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > >
> > > And I also search the ttprivate dir, no luck there :
> > >
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> > >
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> > kjvm.
> > > sh
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> > >
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> > kjvm.
> > > sh
> > >
> > > So, Is there anything I am still missing?
> > >
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> >
> >
> >
> > --
> > Harsh J
RE: How can I get the intermediate output file from mapper class?
Posted by "Liu, Raymond" <ra...@intel.com>.
Hi Harsh
Thanks for your reply. While I don't quite catch what do you mean... Accroding to the description
<property>
<name>keep.task.files.pattern</name>
<value>.*_m_0000*</value>
<description>Keep all files from tasks whose task names match the given
regular expression. Defaults to none.</description>
</property>
Isn't that pattern for the task name? and the task name is something like : task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the data from the tasks from been cleaned?
If this don't work, can you kindly show me what's the exact pattern I should put here for the map->intermediate->reduce intermediate file (the merged partition file waiting to be shuffled to reduce tasks)? I tried ".out*" , it doesn't works too.
Or I should modify some other property instead?
Best Regards,
Raymond Liu
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 10, 2012 12:29 PM
> To: common-user@hadoop.apache.org
> Subject: Re: How can I get the intermediate output file from mapper class?
>
> Hi,
>
> You need the "file.out" and "file.out.index" files when wanting the
> map->intermediate->reduce files. So try a pattern that matches these
> and you should have it.
>
> The "XXXXX" kind of files are what MR produces on HDFS as regular outputs -
> these aren't intermediate.
>
> On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <ra...@intel.com>
> wrote:
> > Hi
> >
> > I am trying to access the intermediate file save to the local
> filesystem from mapreduce's mapper output.
> >
> > I have googled this one :
> > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermedia
> > te-output
> >
> > I am using hadoop 1.0.3 , and I did set following property in
> > mapred-site.xml
> >
> > <property>
> > <name>keep.task.files.pattern</name>
> > <value>.*_m_00000*</value>
> > </property>
> >
> > Then after restart hadoop and run some jobss, I did see tasks in my local dir
> like:
> >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> >
> > But I still cannot find any output dir there.
> >
> > I have four disks mount for local dir, and only jars,work dir are find as
> following:
> >
> > <property>
> > <name>mapred.local.dir</name>
> >
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/
> ma
> >
> pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma
> pr
> > ed</value>
> > </property>
> >
> > Then I search though them:
> >
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> > jars job.xml
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/ raymond@sr173:~$ ls
> >
> /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> > jobToken work
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> >
> > And I also search the ttprivate dir, no luck there :
> >
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> kjvm.
> > sh
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> kjvm.
> > sh
> >
> > So, Is there anything I am still missing?
> >
> >
> > Best Regards,
> > Raymond Liu
> >
>
>
>
> --
> Harsh J
Re: How can I get the intermediate output file from mapper class?
Posted by Harsh J <ha...@cloudera.com>.
Hi,
You need the "file.out" and "file.out.index" files when wanting the
map->intermediate->reduce files. So try a pattern that matches these
and you should have it.
The "XXXXX" kind of files are what MR produces on HDFS as regular
outputs - these aren't intermediate.
On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <ra...@intel.com> wrote:
> Hi
>
> I am trying to access the intermediate file save to the local filesystem from mapreduce's mapper output.
>
> I have googled this one : http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermediate-output
>
> I am using hadoop 1.0.3 , and I did set following property in mapred-site.xml
>
> <property>
> <name>keep.task.files.pattern</name>
> <value>.*_m_00000*</value>
> </property>
>
> Then after restart hadoop and run some jobss, I did see tasks in my local dir like:
>
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> But I still cannot find any output dir there.
>
> I have four disks mount for local dir, and only jars,work dir are find as following:
>
> <property>
> <name>mapred.local.dir</name>
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/mapred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/mapred</value>
> </property>
>
> Then I search though them:
>
> raymond@sr173:~$ ls /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jars job.xml
> raymond@sr173:~$ ls /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> raymond@sr173:~$ ls /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
> jobToken work
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201208101040_0003/
>
> And I also search the ttprivate dir, no luck there :
>
> raymond@sr173:~$ ls /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcache/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/taskjvm.sh
>
> So, Is there anything I am still missing?
>
>
> Best Regards,
> Raymond Liu
>
--
Harsh J