You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2007/10/10 22:50:50 UTC

[jira] Created: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Instantiating a FileSystem object should guarantee the existence of the working directory
-----------------------------------------------------------------------------------------

                 Key: HADOOP-2025
                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
             Project: Hadoop
          Issue Type: Improvement
          Components: fs
    Affects Versions: 0.14.1
            Reporter: Sameer Paranjpye
             Fix For: 0.16.0


Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.

In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.

HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.

When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.

In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Status: Open  (was: Patch Available)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542589 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

> Do you prefer throwing an IOException to an IllegalArgumentException in setWorkingDirectory?

I don't have a strong preference, but do prefer solutions with less code.  I think using IOException results in less code, and would give it the nod for that reason.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555764#action_12555764 ] 

Sanjay Radia commented on HADOOP-2025:
--------------------------------------

Autocreating working directors.
This will not work with permissions (hadoop 1298) because the client side may not have permission 
to create the working dir. Eg. /users is NOT world writable.

Throwing an exception when a working dir path is used but does not exist is probably a better solution.
It mirrors the posix semantics of not being able to chdir to a directory that does not exist.
This would require that admins to religiously create home dirs (and trash-bins) for all its users.
We may have some existing  map-reduce jobs fail but we should fix them and have the job framework check that the working dir exists before starting the job/task.



> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534177 ] 

Chris Douglas commented on HADOOP-2025:
---------------------------------------

Automatically creating a working directory may become awkward with permissions and when delegating tasks to an agent. Further, creating a default place for a user's data should be part of adding that user to the system, not executing a task.

Requesting a "home" directory for a given set of credentials- rather than a default "working directory"- from a FileSystem seems more correct; the working directory seems like FileSystem state owned by an application (i.e. the FileSystem object). If one wants to resolve relative paths, the working directory must be set first on the particular instance.

This way, relative Paths can only be resolved against a FileSystem where the working directory is set, absolute Paths are always OK, FileSystems can return a default directory for a given user (but not in general), and all Paths from a FileSystem are fully qualified (HADOOP-1909).

At the moment, the working directory is set by the TaskTracker (to the property provided) and by IsolationRunner (for local, temporary storage). It is used sparingly, but notably by applications like FsShell and distcp (where reasonable defaults can be set and checked). Are there other places where this is relied on that might make effecting this change more difficult?

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Status: Patch Available  (was: Open)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556056#action_12556056 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

> They require that the input exists and is readable, and that the output location is writable.

I've added HADOOP-2528 for this.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533918 ] 

Sameer Paranjpye commented on HADOOP-2025:
------------------------------------------

Creating '/user/<username>/' or throwing an exception would both be reasonable alternatives. This behavior can probably be FileSystem implementation dependent.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543231 ] 

Hadoop QA commented on HADOOP-2025:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369642/2025-1.patch
against trunk revision r595563.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1105/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1105/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1105/console

This message is automatically generated.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Status: Patch Available  (was: Open)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas reassigned HADOOP-2025:
-------------------------------------

    Assignee: Chris Douglas

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533892 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

Maybe it should attempt to create '/user/<username>/' before it starts using '/'?  I worry about '/' getting polluted on shared filesystems each time a new user comes online.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Attachment: 2025.patch

I took a shot at implementing your idea. I left S3 alone; S3FileSystemBaseTest::testWorkingDirectory() looks like it has particular semantics I ought not to mess with.

It throws an IllegalArgumentException from setWorkingDirectory if the requested dir doesn't exist and IllegalStateException from getWorkingDirectory if it is unset (i.e. not found during initialize). It's assumed that the default working directory will almost always exist.

It looks OK, but it sounded better. Was this what you had in mind?

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543697 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

I'd assumed we'd put a call to setWorkingDir(getDefaultWorkingDir()) in FileSystem#get(), just after the call to initialize().  We might first check that it exists, and leave it unset otherwise, so that folks who never rely on the working dir won't fail unless they attempt to access it.

Then getDefaultWorkingDir() could be protected: no one should ever need to call it.  Why do you have calls to getDefaultWorkingDir() instead of getWorkingDir()?

Also, the default working dir for Hftp should be the same as for HDFS, not "/".  I wonder if we should make that (/home/$user) the default for all filesystems, and then only override it in RawLocalFilesystem and FilterFileSystem.  That would further simplify the implementation of most FileSystems.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544008 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

> I left it public so the Trash would default to /user/$user instead of the cwd [ ... ]

We should uniformly resolve relative paths to the connected directory, no?

> it also leaves open the possibility of applications (like FsShell) creating the default working directory if it doesn't exist or permitting operations like "dfs -mkdirs ."

Let's revisit that then.  We should err on the conservative side when it comes to adding new public methods.  And if we want to later auto-create working directories, we can do that in getDefaultWorkingDir() itself if we want to make that feature FileSystem-specific, or in FileSystem#get() if we want to make it universal.  So I don't see that making getDefaultWorkingDir() public is required to implement this.

> Further, the Trash is initialized before the FileSystem [...] Things are similar in JobTracker initialization, where we encounter an infinite loop [...]

Can you please elaborate on this?  What're the methods in the loop?  Are there other ways to break it besides having folks use getDefaultWorkingDir() to resolve relative paths?  Why are Trash and JobTracker any different from other uses of FileSystem that must resolve relative paths?

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Attachment: 2025-1.patch

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542295 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

This looks mostly good to me.  I don't understand some of the changes in the test code.  Are these required for this issue?

> It looks OK, but it sounded better.

Not sure what you mean.  Do you have reservations about this approach?

One thing I'm not wild about is the amount of duplicate code.  I wonder whether much of this could be moved to the base FileSystem class. The only thing that's really FileSystem-specific is the default working dir.  This might look something like:
{noformat}
private Path workingDir = null;
public Path getWorkingDir() {
  if (workingDir == null) {
    throw new IllegalStateException("working dir unset");
  }
  return workingDir;
}
public void setWorkingDir(Path p) throws IOException {
  if (!exists(p)) {
    throw new IOException("working dir does not exist: " + p);
  }
  workingDir = p;
}
protected Path getDefaultWorkingDir() { return new Path("/"); }
{noformat}

Then most FileSystem implementations would just override getDefaultWorkingDir.  FileSystem.java would call setWorkingDir(getDefaultWorkingDir()) after initialize().

Could that work?


> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542566 ] 

Chris Douglas commented on HADOOP-2025:
---------------------------------------

The changes to the test code were to accommodate the Trash- which attempts to use the working directory if the default path isn't absolute- in tests (TestReplicationPolicy, MiniDFSCluster), correcting a transaction count that was thrown off by creating a working directory in MiniDFSCluster (TestEditLog), and accommodating the new semantics of setWorkingDirectory (TestLocalDFS). If there's a less kludgy way to pass the first and second tests, I'd be open to it.

bq. Not sure what you mean. Do you have reservations about this approach?

Not with the approach, but this impl wasn't fitting well with cases like that in HADOOP-1916, where FsShell might create a default working dir if it didn't exist. This is cleanly solved by adding the FileSystem::getDefaultWorkingDir() method you suggested. I'll rework the patch.

Do you prefer throwing an IOException to an IllegalArgumentException in setWorkingDirectory? I went back and forth on that, but ultimately wanted to distinguish between problems with FileSystem::exists() and a non-existent path (e.g. RawInMemoryFileSystem).

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>         Attachments: 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543275 ] 

Hadoop QA commented on HADOOP-2025:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369704/2025-1.patch
against trunk revision r595563.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1112/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1112/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1112/console

This message is automatically generated.

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543731 ] 

Chris Douglas commented on HADOOP-2025:
---------------------------------------

bq. Also, the default working dir for Hftp should be the same as for HDFS, not "/". I wonder if we should make that (/home/$user) the default for all filesystems, and then only override it in RawLocalFilesystem and FilterFileSystem.

Re: Hftp: yes, that should default to /user/$user, with DistributedFileSystem; I'll correct that. KosmosFileSystem also differs from this default... I'm tempted to just leave it abstract, but if you'd prefer a default in FileSystem to /user/$user that's fine.

bq. Then getDefaultWorkingDir() could be protected: no one should ever need to call it. Why do you have calls to getDefaultWorkingDir() instead of getWorkingDir()?

I left it public so the Trash would default to /user/$user instead of the cwd; it also leaves open the possibility of applications (like FsShell) creating the default working directory if it doesn't exist or permitting operations like "dfs -mkdirs ." . Further, the Trash is initialized before the FileSystem, so a relative path needs to resolve to something before initialize() is called. Things are similar in JobTracker initialization, where we encounter an infinite loop (exercised by TestSocketFactory) if the system dir is relative and the working dir is unset (since the default is absolute, using it here seems appropriate since we create the working dir as a side-effect). Finally, to preserve InMemoryFileSystem's init semantics, it seems necessary to set the working dir in initialize(), not after it (clearly a workaround is possible, but at first glance it seems less attractive than the current patch). Have I misread this?

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Status: Open  (was: Patch Available)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556030#action_12556030 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

> the client side may not have permission to create the working dir.

Then that would throw an exception.  Or we could explicitly check for that, as you suggest, if you think that would yield a more user-friendly exception.

> This would require that admins to religiously create home dirs (and trash-bins) for all its users.

Home dirs, yes, but, if the home dir exists, can't a user create his own trash there on demand?

> have the job framework check that the working dir exists before starting the job/task.

Do mapred jobs require that the working dir exist?  They require that the input exists and is readable, and that the output location is writable.  Now that we have permissions these checks could be improved.  FileInputFormat#validateInput() could check readability.  And OutputFormatBase#checkOutputSpecs() could check that the parent of the output directory is writable.  That's probably a separate issue, no?


> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Attachment: 2025-1.patch

Accommodate findbugs (use explicit null instead of known null)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025-1.patch, 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2025:
----------------------------------

    Attachment:     (was: 2025-1.patch)

> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Chris Douglas
>             Fix For: 0.16.0
>
>         Attachments: 2025.patch
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2025) Instantiating a FileSystem object should guarantee the existence of the working directory

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534358 ] 

Doug Cutting commented on HADOOP-2025:
--------------------------------------

> Automatically creating a working directory may become awkward [ ... ]

Yes, I agree.

So we might instead:
  - initially have the working directory unset if it does not exist (per FileSystem impl, in initialize)
  - check for working directory's existence when it is accessed or set, throwing an exception if it does not (per FileSystem impl, in get/setWorkingDir).

Under that regime, folks who first try to run a mapreduce job with relative paths against an HDFS system they have not used will get a "working dir does not exist" exception, which sounds about right.  Thoughts?


> Instantiating a FileSystem object should guarantee the existence of the working directory
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2025
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2025
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>             Fix For: 0.16.0
>
>
> Issues like HADOOP-1891 and HADOOP-1916 illustrate the need for this behavior.
> In HADOOP-1916 the problem is that the default working directory for a user on HDFS '/user/<username>' does not exist. This results in the command 'hadoop dfs -copyFromLocal foo ." creating a *file* called /user/<username> and copying the contents of the file 'foo' into this file.
> HADOOP-1891 is basically the same problem. The problem that Olga observed was that copying a file to '.' on HDFS when her 'home directory' did not exist resulted in the creation of a file with the path as her home directory. The problem is incorrectly filed as a bug in the Path class. The behavior of Path is correct, as Doug points out, it is perfectly reasonable for Path(".") to convert to an empty path. When this empty path is resolved in HDFS or any other filesystem the resolution to '/user/<username>' is also correct (at least for HDFS). The problem IMO is that the existence of the working directory is not guaranteed.
> When I log in to a machine my default working directory is '/home/sameerp' and filesystem operations that I execute with relative paths all work correctly because this directory exists. My home directory lives on a filer, in the event of it being unmountable the default working directory I get is '/' which also is guaranteed to exist.
> In the context of Hadoop, instantiating a FileSystem object is the analogue of logging in and should result in a working directory whose existence has been validated. In the case of HDFS this should be '/user/<username>' or '/' if the directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.