You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2008/03/04 23:49:40 UTC

[jira] Commented: (HADOOP-2915) mapred output files and directories should be created as the job submitter, not tasktracker or jobtracker

    [ https://issues.apache.org/jira/browse/HADOOP-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575175#action_12575175 ] 

Owen O'Malley commented on HADOOP-2915:
---------------------------------------

You can not use UserGroupInformation as the the key in a hash table, because it has no hash function or equals defined. I would recommend using the user name instead, since String has all of the important methods defined.

In general abbreviations in variable names are hard to read and "saToFs" is pretty opaque.

The finally should go on the same line as the closing brace.

It seems really error-prone having a cache that gives out FileSystems more than once and requiring users to close them. Take for instance, the case where a user has two jobs running at the same time, you can easily end up with two copies of the FileSystem being used for different jobs. I propose that we fix this by making the FileSystem cache contain weak references to the file systems and make the finializers close the filesystem.

The JobTracker should keep a reference to the output file system in JobInProgress and continue to use it over again. When the job completes, the fields should be cleared by JobTracker.markCompletedJob, so that it can be reclaimed by the garbage collector.

> mapred output files and directories should be created as the job submitter, not tasktracker or jobtracker
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2915
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2915
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.16.1
>
>         Attachments: 2915_20080229.patch, 2915_20080302.patch, 2915_20080303.patch
>
>
> Quoted from an email sending to core-dev by Andy Li:
> {quote}
> For example, assuming I have installed Hadoop with an account 'hadoop' and I am going to run my program with user account 'test'. I have created an input folder as /user/test/input/ with user 'test' and the permission is set to 0775.
> /user/test/input      <dir>          2008-02-27 01:20 rwxr-xr-x      test  hadoop
> When I run the MapReduce code, the output I specified will be set to user 'hadoop' instead of 'test'.
> /bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3 "/user/test/input/l" "/user/test/output/"
> The directory "/user/test/output/" will have the following permission and user:group.
> /user/test/output    <dir>          2008-02-27 03:53        rwxr-xr-x hadoop  hadoop
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.