You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ling Kun <lk...@gmail.com> on 2013/04/11 12:33:02 UTC

What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Dear all,
   I am a little confusing about the URI, Home Directory and Working
Directory in the FileSystem.java or HDFS.

  I have listed my understanding about these concept, can someone please
figure out whether I am correct?  Thanks.

   The Home directory: This is usually a directory for a specific Hadoop
users. And for the path, it is a user specific path. In HDFS, it is like
 HDFS://NameNode:port/user/USERNAME.

   The URI: Is this the root of the distributed filesystem. for HDFS, it is
just the HDFS://NameNode:port/ , each file/directory in the distributed
filesystem is just a file or subdirectory in this path.

   The working directory: I am a little confused about this variable. At a
given time, there exists only one instance of the filesystem class, and the
working dir is a private state of the FS. And during the job running,
hadoop will switch among several dirs, and the working dir will be modified
once it is switched. Like in the shared system dir, home dir, or
input/output dir.



   Although I have looked through the related document, I am still a little
confused about the java.net.URI,  java.io.File and
org.apache.hadoop.fs.Path class. It seems URI could be
hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
scheme, hostname and the port.  For the File class, it is just an object
for a specific file.



Thanks

yours,
Ling Kun

-- 
http://www.lingcc.com

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Ling Kun <lk...@gmail.com>.
Dear  Daryn Sharp,
   Your reply helps me a lot for  code reading of the HDFS and FileSystem
interface.

   Thanks.

yours,
Ling Kun


On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
>
> > Dear all,
> >    I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> >
> >   I have listed my understanding about these concept, can someone please
> figure out whether I am correct?  Thanks.
> >
> >    The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
>  HDFS://NameNode:port/user/USERNAME.
>
> Correct.
>
> >    The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
>
> Generally correct.  However, I'd strongly suggest avoiding the use of URIs
> directly.  It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically.  See below for the correct definition of a Path.
>
> >    The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
>
> Correct.
>
> >    Although I have looked through the related document, I am still a
> little confused about the java.net.URI,  java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port.  For the File class, it is just an object
> for a specific file.
>
> Your understanding of Path is incorrect.  Path is really just a veneer
> over a URI.  A Path can be qualified with a scheme/authority, or just be
> absolute or relative.  If a Path is not scheme qualified, it uses the
> defaultFS.  If the Path is not absolute, it's qualified against the working
> directory.  Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
>
> I hope this helps!
>
> Daryn




-- 
http://www.lingcc.com

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Ling Kun <lk...@gmail.com>.
Dear  Daryn Sharp,
   Your reply helps me a lot for  code reading of the HDFS and FileSystem
interface.

   Thanks.

yours,
Ling Kun


On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
>
> > Dear all,
> >    I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> >
> >   I have listed my understanding about these concept, can someone please
> figure out whether I am correct?  Thanks.
> >
> >    The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
>  HDFS://NameNode:port/user/USERNAME.
>
> Correct.
>
> >    The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
>
> Generally correct.  However, I'd strongly suggest avoiding the use of URIs
> directly.  It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically.  See below for the correct definition of a Path.
>
> >    The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
>
> Correct.
>
> >    Although I have looked through the related document, I am still a
> little confused about the java.net.URI,  java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port.  For the File class, it is just an object
> for a specific file.
>
> Your understanding of Path is incorrect.  Path is really just a veneer
> over a URI.  A Path can be qualified with a scheme/authority, or just be
> absolute or relative.  If a Path is not scheme qualified, it uses the
> defaultFS.  If the Path is not absolute, it's qualified against the working
> directory.  Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
>
> I hope this helps!
>
> Daryn




-- 
http://www.lingcc.com

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Ling Kun <lk...@gmail.com>.
Dear  Daryn Sharp,
   Your reply helps me a lot for  code reading of the HDFS and FileSystem
interface.

   Thanks.

yours,
Ling Kun


On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
>
> > Dear all,
> >    I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> >
> >   I have listed my understanding about these concept, can someone please
> figure out whether I am correct?  Thanks.
> >
> >    The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
>  HDFS://NameNode:port/user/USERNAME.
>
> Correct.
>
> >    The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
>
> Generally correct.  However, I'd strongly suggest avoiding the use of URIs
> directly.  It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically.  See below for the correct definition of a Path.
>
> >    The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
>
> Correct.
>
> >    Although I have looked through the related document, I am still a
> little confused about the java.net.URI,  java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port.  For the File class, it is just an object
> for a specific file.
>
> Your understanding of Path is incorrect.  Path is really just a veneer
> over a URI.  A Path can be qualified with a scheme/authority, or just be
> absolute or relative.  If a Path is not scheme qualified, it uses the
> defaultFS.  If the Path is not absolute, it's qualified against the working
> directory.  Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
>
> I hope this helps!
>
> Daryn




-- 
http://www.lingcc.com

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Ling Kun <lk...@gmail.com>.
Dear  Daryn Sharp,
   Your reply helps me a lot for  code reading of the HDFS and FileSystem
interface.

   Thanks.

yours,
Ling Kun


On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
>
> > Dear all,
> >    I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> >
> >   I have listed my understanding about these concept, can someone please
> figure out whether I am correct?  Thanks.
> >
> >    The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
>  HDFS://NameNode:port/user/USERNAME.
>
> Correct.
>
> >    The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
>
> Generally correct.  However, I'd strongly suggest avoiding the use of URIs
> directly.  It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically.  See below for the correct definition of a Path.
>
> >    The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
>
> Correct.
>
> >    Although I have looked through the related document, I am still a
> little confused about the java.net.URI,  java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port.  For the File class, it is just an object
> for a specific file.
>
> Your understanding of Path is incorrect.  Path is really just a veneer
> over a URI.  A Path can be qualified with a scheme/authority, or just be
> absolute or relative.  If a Path is not scheme qualified, it uses the
> defaultFS.  If the Path is not absolute, it's qualified against the working
> directory.  Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
>
> I hope this helps!
>
> Daryn




-- 
http://www.lingcc.com

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Daryn Sharp <da...@yahoo-inc.com>.
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:

> Dear all,
>    I am a little confusing about the URI, Home Directory and Working Directory in the FileSystem.java or HDFS.
> 
>   I have listed my understanding about these concept, can someone please figure out whether I am correct?  Thanks.
> 
>    The Home directory: This is usually a directory for a specific Hadoop users. And for the path, it is a user specific path. In HDFS, it is like  HDFS://NameNode:port/user/USERNAME.

Correct.

>    The URI: Is this the root of the distributed filesystem. for HDFS, it is just the HDFS://NameNode:port/ , each file/directory in the distributed filesystem is just a file or subdirectory in this path.

Generally correct.  However, I'd strongly suggest avoiding the use of URIs directly.  It's better to obtain your filesystems via path.getFileSystem(conf) - it will extract the URI for the filesystem automatically.  See below for the correct definition of a Path.

>    The working directory: I am a little confused about this variable. At a given time, there exists only one instance of the filesystem class, and the working dir is a private state of the FS. And during the job running, hadoop will switch among several dirs, and the working dir will be modified once it is switched. Like in the shared system dir, home dir, or input/output dir.

Correct.

>    Although I have looked through the related document, I am still a little confused about the java.net.URI,  java.io.File and org.apache.hadoop.fs.Path class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be the path without the scheme, hostname and the port.  For the File class, it is just an object for a specific file.

Your understanding of Path is incorrect.  Path is really just a veneer over a URI.  A Path can be qualified with a scheme/authority, or just be absolute or relative.  If a Path is not scheme qualified, it uses the defaultFS.  If the Path is not absolute, it's qualified against the working directory.  Path provides some niceties like not requiring percent encoding in the path portion of the URI, and allows use of glob chars and the quoting thereof.

I hope this helps!

Daryn

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Daryn Sharp <da...@yahoo-inc.com>.
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:

> Dear all,
>    I am a little confusing about the URI, Home Directory and Working Directory in the FileSystem.java or HDFS.
> 
>   I have listed my understanding about these concept, can someone please figure out whether I am correct?  Thanks.
> 
>    The Home directory: This is usually a directory for a specific Hadoop users. And for the path, it is a user specific path. In HDFS, it is like  HDFS://NameNode:port/user/USERNAME.

Correct.

>    The URI: Is this the root of the distributed filesystem. for HDFS, it is just the HDFS://NameNode:port/ , each file/directory in the distributed filesystem is just a file or subdirectory in this path.

Generally correct.  However, I'd strongly suggest avoiding the use of URIs directly.  It's better to obtain your filesystems via path.getFileSystem(conf) - it will extract the URI for the filesystem automatically.  See below for the correct definition of a Path.

>    The working directory: I am a little confused about this variable. At a given time, there exists only one instance of the filesystem class, and the working dir is a private state of the FS. And during the job running, hadoop will switch among several dirs, and the working dir will be modified once it is switched. Like in the shared system dir, home dir, or input/output dir.

Correct.

>    Although I have looked through the related document, I am still a little confused about the java.net.URI,  java.io.File and org.apache.hadoop.fs.Path class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be the path without the scheme, hostname and the port.  For the File class, it is just an object for a specific file.

Your understanding of Path is incorrect.  Path is really just a veneer over a URI.  A Path can be qualified with a scheme/authority, or just be absolute or relative.  If a Path is not scheme qualified, it uses the defaultFS.  If the Path is not absolute, it's qualified against the working directory.  Path provides some niceties like not requiring percent encoding in the path portion of the URI, and allows use of glob chars and the quoting thereof.

I hope this helps!

Daryn

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Daryn Sharp <da...@yahoo-inc.com>.
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:

> Dear all,
>    I am a little confusing about the URI, Home Directory and Working Directory in the FileSystem.java or HDFS.
> 
>   I have listed my understanding about these concept, can someone please figure out whether I am correct?  Thanks.
> 
>    The Home directory: This is usually a directory for a specific Hadoop users. And for the path, it is a user specific path. In HDFS, it is like  HDFS://NameNode:port/user/USERNAME.

Correct.

>    The URI: Is this the root of the distributed filesystem. for HDFS, it is just the HDFS://NameNode:port/ , each file/directory in the distributed filesystem is just a file or subdirectory in this path.

Generally correct.  However, I'd strongly suggest avoiding the use of URIs directly.  It's better to obtain your filesystems via path.getFileSystem(conf) - it will extract the URI for the filesystem automatically.  See below for the correct definition of a Path.

>    The working directory: I am a little confused about this variable. At a given time, there exists only one instance of the filesystem class, and the working dir is a private state of the FS. And during the job running, hadoop will switch among several dirs, and the working dir will be modified once it is switched. Like in the shared system dir, home dir, or input/output dir.

Correct.

>    Although I have looked through the related document, I am still a little confused about the java.net.URI,  java.io.File and org.apache.hadoop.fs.Path class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be the path without the scheme, hostname and the port.  For the File class, it is just an object for a specific file.

Your understanding of Path is incorrect.  Path is really just a veneer over a URI.  A Path can be qualified with a scheme/authority, or just be absolute or relative.  If a Path is not scheme qualified, it uses the defaultFS.  If the Path is not absolute, it's qualified against the working directory.  Path provides some niceties like not requiring percent encoding in the path portion of the URI, and allows use of glob chars and the quoting thereof.

I hope this helps!

Daryn

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

Posted by Daryn Sharp <da...@yahoo-inc.com>.
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:

> Dear all,
>    I am a little confusing about the URI, Home Directory and Working Directory in the FileSystem.java or HDFS.
> 
>   I have listed my understanding about these concept, can someone please figure out whether I am correct?  Thanks.
> 
>    The Home directory: This is usually a directory for a specific Hadoop users. And for the path, it is a user specific path. In HDFS, it is like  HDFS://NameNode:port/user/USERNAME.

Correct.

>    The URI: Is this the root of the distributed filesystem. for HDFS, it is just the HDFS://NameNode:port/ , each file/directory in the distributed filesystem is just a file or subdirectory in this path.

Generally correct.  However, I'd strongly suggest avoiding the use of URIs directly.  It's better to obtain your filesystems via path.getFileSystem(conf) - it will extract the URI for the filesystem automatically.  See below for the correct definition of a Path.

>    The working directory: I am a little confused about this variable. At a given time, there exists only one instance of the filesystem class, and the working dir is a private state of the FS. And during the job running, hadoop will switch among several dirs, and the working dir will be modified once it is switched. Like in the shared system dir, home dir, or input/output dir.

Correct.

>    Although I have looked through the related document, I am still a little confused about the java.net.URI,  java.io.File and org.apache.hadoop.fs.Path class. It seems URI could be hdfs://XXX/XXX/FILENAME, while Path only can be the path without the scheme, hostname and the port.  For the File class, it is just an object for a specific file.

Your understanding of Path is incorrect.  Path is really just a veneer over a URI.  A Path can be qualified with a scheme/authority, or just be absolute or relative.  If a Path is not scheme qualified, it uses the defaultFS.  If the Path is not absolute, it's qualified against the working directory.  Path provides some niceties like not requiring percent encoding in the path portion of the URI, and allows use of glob chars and the quoting thereof.

I hope this helps!

Daryn