You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2012/07/11 07:05:39 UTC

Re: equivalent of "-file" option in the programmatic call, (access jobID before submit())

Hadoop provides the DistributedCache API for this. See
http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

On Wed, Jul 11, 2012 at 9:15 AM, GUOJUN Zhu <gu...@freddiemac.com> wrote:
>
> Hi,
>
> I am using the programmatic call to initialize the hadoop job.
> ("jobClient.submitJob( m_JobConf )")   I need to put a big object in
> distributed cache.  So I serialize it and send it over.  With the
> ToolRunner, I can use -file and the file has been sent over into the job
> directory and different jobs have no conflict.  However, there is no such
> thing in the programmatic submission.
>
> I originally just upload the file into hdfs and then add the hdfs address
> into distributed cache.  But to avoid the multiple job conflicts, I would
> like to add the jobID as a prefix or suffix to the remote name, however, I
> cannot access jobID until the submitJob() call which is too late for
> uploading files to HDFS.
>
> Alternatively, I read through the source code, I added the properties
> "tmpfiles" into jobConf object before the submitJob() call.
>  conf.set( "tmpfiles", output.makeQualified( localFs ).toUri() + "#" +
> symlink );
> This seems the internal mechnism of the "-file" option.  But it feels very
> hacky.  It would be nice that Hadoop provides some more formal way to handle
> this.  Thanks.
>
> BTW: I am using 0.20.2 (CDH3u3)
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_zhu@freddiemac.com
> Financial Engineering
> Freddie Mac



-- 
Harsh J

Re: equivalent of "-file" option in the programmatic call, (access jobID before submit())

Posted by GUOJUN Zhu <gu...@freddiemac.com>.
Which method do you refer to?  I think DistributedCache.addLocalFiles() 
only works for the files local to the task nodes?   What I  want is to 
upload a file into the job specific directory of (HDFS) and register it 
with DistributedCache (and maybe clear it up after the job finished).   Is 
there  an easy call to accomplish this? 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac



   Harsh J <ha...@cloudera.com> 
   07/11/2012 01:05 AM
   Please respond to
mapreduce-user@hadoop.apache.org


To
mapreduce-user@hadoop.apache.org
cc

Subject
Re: equivalent of "-file" option in the programmatic call, (access jobID 
before submit())






Hadoop provides the DistributedCache API for this. See
http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html


On Wed, Jul 11, 2012 at 9:15 AM, GUOJUN Zhu <gu...@freddiemac.com> 
wrote:
>
> Hi,
>
> I am using the programmatic call to initialize the hadoop job.
> ("jobClient.submitJob( m_JobConf )")   I need to put a big object in
> distributed cache.  So I serialize it and send it over.  With the
> ToolRunner, I can use -file and the file has been sent over into the job
> directory and different jobs have no conflict.  However, there is no 
such
> thing in the programmatic submission.
>
> I originally just upload the file into hdfs and then add the hdfs 
address
> into distributed cache.  But to avoid the multiple job conflicts, I 
would
> like to add the jobID as a prefix or suffix to the remote name, however, 
I
> cannot access jobID until the submitJob() call which is too late for
> uploading files to HDFS.
>
> Alternatively, I read through the source code, I added the properties
> "tmpfiles" into jobConf object before the submitJob() call.
>  conf.set( "tmpfiles", output.makeQualified( localFs ).toUri() + "#" +
> symlink );
> This seems the internal mechnism of the "-file" option.  But it feels 
very
> hacky.  It would be nice that Hadoop provides some more formal way to 
handle
> this.  Thanks.
>
> BTW: I am using 0.20.2 (CDH3u3)
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_zhu@freddiemac.com
> Financial Engineering
> Freddie Mac



-- 
Harsh J