You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2010/04/10 01:14:46 UTC

-files flag question

I'm a little confused how the -files flag works.  My understanding is that it takes two arguments: a file URI (could be local or on HDFS, assumed local if no URI scheme is provided) and a short "tag" representing the file on the distributed cache, usually just the name of the file without the long path that precedes it in the URI.

But, -files can also pass multiple files to the distributed cache, so, how does this all go together.  Are odd arguments all URIs and even arguments all cache-tags?  Is it that simple?  I'm not really sure how to fit it all together if I need to send several files to the distributed cache (several shared libraries for example).

Thanks.

________________________________________________________________________________
Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive compulsive and debilitatingly slow."
  -- Keith Wiley
________________________________________________________________________________




Re: -files flag question

Posted by Keith Wiley <kw...@keithwiley.com>.
So how does the example work where the second argument is simply  
separated by a space and indicates some sort of "label" by which to  
find the file in the distributed cache:

... -files URI_TO_FILE name ...

where 'name' is canonically the file name in the uri but without a  
scheme or path, just the filename.  How does that use case conform to  
your examples?

On 2010, Apr 11, at 11:12 PM, Amareshwari Sri Ramadasu wrote:

> Hi Keith Willey,
>
> -files option takes comma separated files (passed as URIs) to make  
> them available on compute nodes for maps or reduces.
> For example,
>  -files file:///myfiles/file1,file:///myfiles/file2,hdfs:/localhost:9000/files/dfsfile 
> .
>
> You can also pass a symlink name in the uri's fragment.
> For example,
>  -files file:///myfiles/file1#file1,file:///myfiles2/file1#file2
> But the second example does not work as expected in branch 0.20.  
> ( see http://issues.apache.org/jira/browse/MAPREDUCE-787)
> I hope the above examples clarify your confusions.
>
> Thanks
> Amareshwari
>
>
> On 4/10/10 4:44 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:
>
> I'm a little confused how the -files flag works.  My understanding  
> is that it takes two arguments: a file URI (could be local or on  
> HDFS, assumed local if no URI scheme is provided) and a short "tag"  
> representing the file on the distributed cache, usually just the  
> name of the file without the long path that precedes it in the URI.
>
> But, -files can also pass multiple files to the distributed cache,  
> so, how does this all go together.  Are odd arguments all URIs and  
> even arguments all cache-tags?  Is it that simple?  I'm not really  
> sure how to fit it all together if I need to send several files to  
> the distributed cache (several shared libraries for example).
>
>
>
>


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com     
music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson,  
that to be
self-contented is to be vile and ignorant, and that to aspire is  
better than to
be blindly and impotently happy."
                                            --  Edwin A. Abbott,  
Flatland
________________________________________________________________________________


Re: -files flag question

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi Keith Willey,

-files option takes comma separated files (passed as URIs) to make them available on compute nodes for maps or reduces.
For example,
  -files file:///myfiles/file1,file:///myfiles/file2,hdfs:/localhost:9000/files/dfsfile.

You can also pass a symlink name in the uri's fragment.
For example,
  -files file:///myfiles/file1#file1,file:///myfiles2/file1#file2
But the second example does not work as expected in branch 0.20. ( see http://issues.apache.org/jira/browse/MAPREDUCE-787)
I hope the above examples clarify your confusions.

Thanks
Amareshwari


On 4/10/10 4:44 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:

I'm a little confused how the -files flag works.  My understanding is that it takes two arguments: a file URI (could be local or on HDFS, assumed local if no URI scheme is provided) and a short "tag" representing the file on the distributed cache, usually just the name of the file without the long path that precedes it in the URI.

But, -files can also pass multiple files to the distributed cache, so, how does this all go together.  Are odd arguments all URIs and even arguments all cache-tags?  Is it that simple?  I'm not really sure how to fit it all together if I need to send several files to the distributed cache (several shared libraries for example).