You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2011/03/23 20:17:00 UTC

General question about FileSystem.makeQualified()

I'm seeing a lot of code that goes out of its way to make a Path in
Hadoop fully-qualified. It ends up taking a few lines of code. I
suspect some of it is spurious. I'm trying to confirm my understanding
of when you would need a fully-qualified path.

This seems to be necessary in general when sending around a Path, or
storing it, since the a relative path is only partial information and
is valid only when the context (working directory) is known. Other
than that... shouldn't be too necessary?

I sort of ask since I look at the following code, and wonder how much
is necessary? If I stripped it down it looks like...

void foo(String pathString, Configuration conf) {
  Path unqualified = new Path(pathString);
  FileSystem fs = FileSystem.get(unqualified.toUri(), conf);
  Path path = unqualified.makeQualified(fs);
  ...
  new SequenceFile.Reader(fs, new Path(path).makeQualified(fs), conf) ...
  ...
}

Since I presume SequenceFile.Reader itself makes sense of the path in
the context of "conf" anyway, all the rest seems redundant.
Or put another way, I don't see what these acrobatics can add --
whatever knowledge is in "conf" is already used deeper down in
SequenceFile.Reader.

But I recall there's some subtlety with, say, handling s3:// and
s3n:// URLs here?

Any comments on what's the right thing to do?

Re: General question about FileSystem.makeQualified()

Posted by Sean Owen <sr...@gmail.com>.
OK. I am in particular looking at
TasteHadoopUtils.readItemIDIndexMap(). Would this ever be fed a
composite path like that?

I think you're also suggesting that it never hurts to qualify the
path. So, the utility class SequenceFileIterable ought to do this.

Well I'd rather err on the side of not breaking things so I'll try not
to change behavior here.

On Wed, Mar 23, 2011 at 7:24 PM, Sebastian Schelter
<ss...@googlemail.com> wrote:
> Those code pieces are from me and they were necessary to make combined
> pathes like this work on S3 for me:
>
> Path combined = new Path(pathA + "," + pathB)
>
> It's been a quick (and somewhat ugly) workaround, if someone knows a better
> solution I'd be happy to see it refactored.
>
> --sebastian
>
>
> On 23.03.2011 20:17, Sean Owen wrote:
>>
>> I'm seeing a lot of code that goes out of its way to make a Path in
>> Hadoop fully-qualified. It ends up taking a few lines of code. I
>> suspect some of it is spurious. I'm trying to confirm my understanding
>> of when you would need a fully-qualified path.
>>
>> This seems to be necessary in general when sending around a Path, or
>> storing it, since the a relative path is only partial information and
>> is valid only when the context (working directory) is known. Other
>> than that... shouldn't be too necessary?
>>
>> I sort of ask since I look at the following code, and wonder how much
>> is necessary? If I stripped it down it looks like...
>>
>> void foo(String pathString, Configuration conf) {
>>   Path unqualified = new Path(pathString);
>>   FileSystem fs = FileSystem.get(unqualified.toUri(), conf);
>>   Path path = unqualified.makeQualified(fs);
>>   ...
>>   new SequenceFile.Reader(fs, new Path(path).makeQualified(fs), conf) ...
>>   ...
>> }
>>
>> Since I presume SequenceFile.Reader itself makes sense of the path in
>> the context of "conf" anyway, all the rest seems redundant.
>> Or put another way, I don't see what these acrobatics can add --
>> whatever knowledge is in "conf" is already used deeper down in
>> SequenceFile.Reader.
>>
>> But I recall there's some subtlety with, say, handling s3:// and
>> s3n:// URLs here?
>>
>> Any comments on what's the right thing to do?
>
>

Re: General question about FileSystem.makeQualified()

Posted by Sebastian Schelter <ss...@googlemail.com>.
Those code pieces are from me and they were necessary to make combined 
pathes like this work on S3 for me:

Path combined = new Path(pathA + "," + pathB)

It's been a quick (and somewhat ugly) workaround, if someone knows a 
better solution I'd be happy to see it refactored.

--sebastian


On 23.03.2011 20:17, Sean Owen wrote:
> I'm seeing a lot of code that goes out of its way to make a Path in
> Hadoop fully-qualified. It ends up taking a few lines of code. I
> suspect some of it is spurious. I'm trying to confirm my understanding
> of when you would need a fully-qualified path.
>
> This seems to be necessary in general when sending around a Path, or
> storing it, since the a relative path is only partial information and
> is valid only when the context (working directory) is known. Other
> than that... shouldn't be too necessary?
>
> I sort of ask since I look at the following code, and wonder how much
> is necessary? If I stripped it down it looks like...
>
> void foo(String pathString, Configuration conf) {
>    Path unqualified = new Path(pathString);
>    FileSystem fs = FileSystem.get(unqualified.toUri(), conf);
>    Path path = unqualified.makeQualified(fs);
>    ...
>    new SequenceFile.Reader(fs, new Path(path).makeQualified(fs), conf) ...
>    ...
> }
>
> Since I presume SequenceFile.Reader itself makes sense of the path in
> the context of "conf" anyway, all the rest seems redundant.
> Or put another way, I don't see what these acrobatics can add --
> whatever knowledge is in "conf" is already used deeper down in
> SequenceFile.Reader.
>
> But I recall there's some subtlety with, say, handling s3:// and
> s3n:// URLs here?
>
> Any comments on what's the right thing to do?