You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2010/04/10 01:14:46 UTC
-files flag question
I'm a little confused how the -files flag works. My understanding is that it takes two arguments: a file URI (could be local or on HDFS, assumed local if no URI scheme is provided) and a short "tag" representing the file on the distributed cache, usually just the name of the file without the long path that precedes it in the URI.
But, -files can also pass multiple files to the distributed cache, so, how does this all go together. Are odd arguments all URIs and even arguments all cache-tags? Is it that simple? I'm not really sure how to fit it all together if I need to send several files to the distributed cache (several shared libraries for example).
Thanks.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com www.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: -files flag question
Posted by Keith Wiley <kw...@keithwiley.com>.
So how does the example work where the second argument is simply
separated by a space and indicates some sort of "label" by which to
find the file in the distributed cache:
... -files URI_TO_FILE name ...
where 'name' is canonically the file name in the uri but without a
scheme or path, just the filename. How does that use case conform to
your examples?
On 2010, Apr 11, at 11:12 PM, Amareshwari Sri Ramadasu wrote:
> Hi Keith Willey,
>
> -files option takes comma separated files (passed as URIs) to make
> them available on compute nodes for maps or reduces.
> For example,
> -files file:///myfiles/file1,file:///myfiles/file2,hdfs:/localhost:9000/files/dfsfile
> .
>
> You can also pass a symlink name in the uri's fragment.
> For example,
> -files file:///myfiles/file1#file1,file:///myfiles2/file1#file2
> But the second example does not work as expected in branch 0.20.
> ( see http://issues.apache.org/jira/browse/MAPREDUCE-787)
> I hope the above examples clarify your confusions.
>
> Thanks
> Amareshwari
>
>
> On 4/10/10 4:44 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:
>
> I'm a little confused how the -files flag works. My understanding
> is that it takes two arguments: a file URI (could be local or on
> HDFS, assumed local if no URI scheme is provided) and a short "tag"
> representing the file on the distributed cache, usually just the
> name of the file without the long path that precedes it in the URI.
>
> But, -files can also pass multiple files to the distributed cache,
> so, how does this all go together. Are odd arguments all URIs and
> even arguments all cache-tags? Is it that simple? I'm not really
> sure how to fit it all together if I need to send several files to
> the distributed cache (several shared libraries for example).
>
>
>
>
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com
music.keithwiley.com
"Yet mark his perfect self-contentment, and hence learn his lesson,
that to be
self-contented is to be vile and ignorant, and that to aspire is
better than to
be blindly and impotently happy."
-- Edwin A. Abbott,
Flatland
________________________________________________________________________________
Re: -files flag question
Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi Keith Willey,
-files option takes comma separated files (passed as URIs) to make them available on compute nodes for maps or reduces.
For example,
-files file:///myfiles/file1,file:///myfiles/file2,hdfs:/localhost:9000/files/dfsfile.
You can also pass a symlink name in the uri's fragment.
For example,
-files file:///myfiles/file1#file1,file:///myfiles2/file1#file2
But the second example does not work as expected in branch 0.20. ( see http://issues.apache.org/jira/browse/MAPREDUCE-787)
I hope the above examples clarify your confusions.
Thanks
Amareshwari
On 4/10/10 4:44 AM, "Keith Wiley" <kw...@keithwiley.com> wrote:
I'm a little confused how the -files flag works. My understanding is that it takes two arguments: a file URI (could be local or on HDFS, assumed local if no URI scheme is provided) and a short "tag" representing the file on the distributed cache, usually just the name of the file without the long path that precedes it in the URI.
But, -files can also pass multiple files to the distributed cache, so, how does this all go together. Are odd arguments all URIs and even arguments all cache-tags? Is it that simple? I'm not really sure how to fit it all together if I need to send several files to the distributed cache (several shared libraries for example).