You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Krishna Ramachandran (JIRA)" <ji...@apache.org> on 2010/08/20 02:09:17 UTC

[jira] Created: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Use new FileContext APIs for all mapreduce components 
------------------------------------------------------

                 Key: MAPREDUCE-2020
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.22.0
            Reporter: Krishna Ramachandran
            Assignee: Krishna Ramachandran


Migrate mapreduce components to using improved FileContext APIs implemented in

HADOOP-4952 and 
HADOOP-6223



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-1.patch

updated
Fix testcase to use FileContext

> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-6.patch

Updated patch to address concerns from last review 

Revised MapTask ReduceTask and MergeManager.java

Removed trailing blanks the patch introduced

Cleaning of Merge.merge has to wait to prevent breaking compat. 

Removed unused members

MergeManager - removed unused private method but has to keep the constructor for now again for compat

New components migrated

All files in
src/java/org/apache/hadoop/mapreduce/filecache/

src/java/org/apache/hadoop/mapred/TaskRunner.java

src/java/org/apache/hadoop/mapred/LocalJobRunner.java

testpatch output

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.




> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020.patch

First cut. With primary focus on JobTracker, UserLogCleaner and some util classes

TaskTracker, JobHistory, CleanUpQueue and other components are "work in progress" and not part of this

Initial Goal is:
get initial feedback from mapred and hdfs 
ask for enhancements/fixes from DFS where inadequate/broken
Optimize/eliminate needless RPC calls (exists() checks)
Streamline API calls (eliminate to FileSystem)

refactoring - work in progress

"ant test" did not show any regressions

testpatch output

     [exec]
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.






> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907497#action_12907497 ] 

Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------

Substantive:

 * {{"fs.AbstractFileSystem.file.impl"}} probably should be new {{JobContext.FOO}} style
   ** 3 instances (at least)
 * Merger.java:  {{merge()}} method madness:  how many do we need?  already had 7; now have 14...  where does it end??
 * MultiFileInputFormat.java:  lose {{import org.apache.hadoop.fs.FileSystem}}:  not used; slows build, adds confusion
   ** probably ditto MultiFileSplit.java and TestMRAsyncDiskService.java
 * MergeManager.java:  _massive_ pile of duplicated constructor and finalMerge() code:  share!  (may have previously allowed in case of "short-term transition," but Hadoop API transitions are _not_ short-term => high risk of mismatch-errors in future changes; should share code where possible even for "temporary" cases, e.g., by calling private helper function from both copies)

Cosmetic:

 * still adding trailing whitespace (IFile.java, Merger.java)
   ** if necessary, fire up vim on diff and search:  /^+.* $
 * still bad wraps, e.g.:
{noformat}
+    Deserializer<T> deserializer = (Deserializer<T>) factory
+        .getDeserializer(cls);
{noformat}
   or lack of wrap:
{noformat}
+      job.set("fs.AbstractFileSystem.file.impl", "org.apache.hadoop.fs.local.RawLocalFs");
{noformat}
   ** if line starts with a period, almost guaranteed to be wrong
 * avoid superfluous "this." decorations (Merger.java, MergeManager.java):
{noformat}
+      this.mapOutputsCounter = mergedMapOutputsCounter;
+    this.localFC = null;
+    this.rfc = null;
{noformat}


> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904710#action_12904710 ] 

Krishna Ramachandran commented on MAPREDUCE-2020:
-------------------------------------------------

Thanks Greg!

I will include these in the next rev
Krishna

On 8/30/10 6:21 PM, "Greg Roelofs (JIRA)" <ji...@apache.org> wrote:



    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904479#action_12904479 ]

Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------

Looks good so far.  There are some cosmetic issues and some leftover TODO items, but nothing major and only one or two medium-level items.

general:
 * remove all trailing whitespace from _new_ code (old code is full of it and makes diffs huge, but no excuse for adding to problem)
 * don't replicate, share/reuse (if possible) - or, if adding new-API variant, deprecate old-API variant at same time

JobTracker.java:
 * @@ -1575,12 +1584,10 @@
   ** commented-out error message
   ** potential conversion of non-access exception to AccessControlException
 * @@ -1599,11 +1606,11 @@
   ** TODO
 * @@ -2380,7 +2387,7 @@
   ** debug converted to info
 * @@ -3075,8 +3082,9 @@
   ** {{filecontext.mkdir(jobDir, new FsPermission(SYSTEM_DIR_PERMISSION), true);}} - prefer try/catch block over uncaught exceptions, if for no other reason than to customize error message
 * @@ -4759,6 +4767,19 @@
   ** TODO

UserLogCleaner.java:
 * @@ -59,7 +60,7 @@
   ** wrap at 80, and _don't_ do so at class-member dot if don't have to (ugh!) - member reference is highest-precedence operator of anything in Java/C/C++, which means it's the _least_ preferred break-point per Apache code conventions

BAD:
{noformat}
     logAsyncDisk = new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf), TaskLog
         .getUserLogDir().toString());
{noformat}

FAIR:
{noformat}
     logAsyncDisk = new MRAsyncDiskService(
         FileContext.getLocalFSFileContext(conf),
         TaskLog.getUserLogDir().toString());
{noformat}

GOOD:
{noformat}
     logAsyncDisk =
         new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
                                TaskLog.getUserLogDir().toString());
{noformat}

ALSO GOOD:
{noformat}
     logAsyncDisk =
         new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
         TaskLog.getUserLogDir().toString());
{noformat}

JobHistory.java:
 * @@ -180,6 +184,40 @@
   ** in general, avoid duplicating code:  either share/reuse or, if providing new-API version (as here), deprecate old one when new one added
   ** blank line required between methods
 * @@ -190,6 +228,15 @@
   ** blank line required between methods

MRAsyncDiskService.java:
 * @@ -56,9 +60,13 @@
   ** {{FsPermission.createImmutable((short) 0700); // rwx------}} - see Localizer.PermissionsHandler.sevenZeroZero:  use?
 * @@ -88,7 +96,71 @@
   ** deprecate old ctor
   ** {{* be absolte paths,}} - fix spelling (both cases)
   ** {{this.volumes = new String[nonCanonicalVols.length];}} - ugh, lose the "this." for everything that doesn't need it (i.e., everything that's not a duplicate name of a method argument)
 * @@ -98,10 +170,12 @@
   ** wrap per coding conventions
 * @@ -114,13 +188,14 @@
   ** commented-out line
 * @@ -246,18 +321,9 @@
   ** not equivalent semantics:  if throw due to perms or whatever, will be converted to FileNotFoundException
 * @@ -296,10 +362,10 @@
   ** wrap per coding conventions

MiniMRCluster.java:
 * @@ -375,7 +377,7 @@
   ** got rid of one trailing space but left the other one??
 * @@ -388,6 +390,18 @@
   ** don't replicate, reuse!  in this case, namenode-arg configureJobConf() can call this new variant and just add extra namenode config call first (or last, whatever)
   ** nuke new trailing spaces

TestJobTrackerStart.java:
 * @@ -29,7 +30,9 @@
   ** what is purpose of dual call to configureJobConf() ?



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904479#action_12904479 ] 

Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------

Looks good so far.  There are some cosmetic issues and some leftover TODO items, but nothing major and only one or two medium-level items.

general:
 * remove all trailing whitespace from _new_ code (old code is full of it and makes diffs huge, but no excuse for adding to problem)
 * don't replicate, share/reuse (if possible) - or, if adding new-API variant, deprecate old-API variant at same time

JobTracker.java:
 * @@ -1575,12 +1584,10 @@
   ** commented-out error message
   ** potential conversion of non-access exception to AccessControlException
 * @@ -1599,11 +1606,11 @@
   ** TODO
 * @@ -2380,7 +2387,7 @@
   ** debug converted to info
 * @@ -3075,8 +3082,9 @@
   ** {{filecontext.mkdir(jobDir, new FsPermission(SYSTEM_DIR_PERMISSION), true);}} - prefer try/catch block over uncaught exceptions, if for no other reason than to customize error message
 * @@ -4759,6 +4767,19 @@
   ** TODO

UserLogCleaner.java:
 * @@ -59,7 +60,7 @@
   ** wrap at 80, and _don't_ do so at class-member dot if don't have to (ugh!) - member reference is highest-precedence operator of anything in Java/C/C++, which means it's the _least_ preferred break-point per Apache code conventions

BAD:
{noformat}
     logAsyncDisk = new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf), TaskLog
         .getUserLogDir().toString());
{noformat}

FAIR:
{noformat}
     logAsyncDisk = new MRAsyncDiskService(
         FileContext.getLocalFSFileContext(conf),
         TaskLog.getUserLogDir().toString());
{noformat}

GOOD:
{noformat}
     logAsyncDisk =
         new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
                                TaskLog.getUserLogDir().toString());
{noformat}

ALSO GOOD:
{noformat}
     logAsyncDisk =
         new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
         TaskLog.getUserLogDir().toString());
{noformat}

JobHistory.java:
 * @@ -180,6 +184,40 @@
   ** in general, avoid duplicating code:  either share/reuse or, if providing new-API version (as here), deprecate old one when new one added
   ** blank line required between methods
 * @@ -190,6 +228,15 @@
   ** blank line required between methods

MRAsyncDiskService.java:
 * @@ -56,9 +60,13 @@
   ** {{FsPermission.createImmutable((short) 0700); // rwx------}} - see Localizer.PermissionsHandler.sevenZeroZero:  use?
 * @@ -88,7 +96,71 @@
   ** deprecate old ctor
   ** {{* be absolte paths,}} - fix spelling (both cases)
   ** {{this.volumes = new String[nonCanonicalVols.length];}} - ugh, lose the "this." for everything that doesn't need it (i.e., everything that's not a duplicate name of a method argument)
 * @@ -98,10 +170,12 @@
   ** wrap per coding conventions
 * @@ -114,13 +188,14 @@
   ** commented-out line
 * @@ -246,18 +321,9 @@
   ** not equivalent semantics:  if throw due to perms or whatever, will be converted to FileNotFoundException
 * @@ -296,10 +362,10 @@
   ** wrap per coding conventions

MiniMRCluster.java:
 * @@ -375,7 +377,7 @@
   ** got rid of one trailing space but left the other one??
 * @@ -388,6 +390,18 @@
   ** don't replicate, reuse!  in this case, namenode-arg configureJobConf() can call this new variant and just add extra namenode config call first (or last, whatever)
   ** nuke new trailing spaces

TestJobTrackerStart.java:
 * @@ -29,7 +30,9 @@
   ** what is purpose of dual call to configureJobConf() ?


> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-4.patch

Updated patch (still preliminary):
More classes  use FileContext:

MapTask
ReduceTask
+ their dependencies

MultiFileInputFormat and 
MultiFileSplit

+ additonal fixes from previous review (Greg)

Work In Progress:

refactor
BlockLocation API usage (no FIleContext)
SequenceFileInputFormat
Add test reports (ant test and ant testpatch)

Following will not be migrated for now:

SequenceFile.createWriter in SkipFile module



> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905649#action_12905649 ] 

Krishna Ramachandran commented on MAPREDUCE-2020:
-------------------------------------------------

Comments from Chris D

> Quick update:
> Am working on Map/ReduceTask migration. There are couple potential
> blockers for a full migration
>       • MapTask uses RawFileSystem in some places on local files which
> likely saves checksum overheads. But there is no convenient way of
> getting a handle if we use LocalFileContext - talking to hdfs folks
> on this.
>               • Near term we may have to move away from RawFileSystem use
> LocalFileSystem

Absolutely not. The spill files require chunking of the checksum,
which (AFAIK) is not available from LocalFileSystem APIs. We don't
require redundant checksumming and- given that this is in an inner
loop- ought not to tolerate it.

>               • may not be that bad as crc opertations are native and should be
> fast (even with Raw MapTask enables some approximate checksum using
> ChecksumFileSystem)

The native code is an overhead, not an advantage. Nicholas rewrote the
checksum code in HADOOP-6166 to speed this up in HDFS and MR. AFAIK,
MapTask does not use ChecksumFileSystem anywhere in the data path...
it might use it for ancillary files somewhere, but the spill, merge,
and output serving should use the raw FS exclusively. -C

>       • sequenceFile.createWriter is hadoop utility and takes FileSystem
> as a parameter.
>
> I have a cleaner solution for exists() - no longer nneds a workaround.


> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-11.patch

Updated
Use FileContext for most OutputFormat and subclasses 

There are a few outstanding issues to be addressed by DFS team - will be tracked using separate JIRAs

Major ones:

1) Security (AccessControl using UGI)

2) Enable  ProgressReport while creating output stream



> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-10.patch, mapred-2020-11.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-5.patch

Fix for MultiFileSplit  
Use FileContext to get BlockLocations

> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-7.patch

Updated patch with more components using FileContext APIs
Changes since last rev (so only need to focus on these)

Updated  src/java/org/apache/hadoop/mapred/JobTracker.java (due to trunk changes)

filecache
---------
 src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCache.java
 src/java/org/apache/hadoop/mapreduce/filecache/DistributedCache.java
 src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java

Job Client components
--------------------------

 src/java/org/apache/hadoop/mapreduce/Cluster.java
 src/java/org/apache/hadoop/mapreduce/Job.java
 src/java/org/apache/hadoop/mapreduce/JobSubmitter.java

Others
----------
 src/java/org/apache/hadoop/mapred/JobInProgress.java
 src/java/org/apache/hadoop/mapred/LocalJobRunner.java
 src/java/org/apache/hadoop/mapred/lib/FilterOutputFormat.java
 src/java/org/apache/hadoop/mapred/lib/NullOutputFormat.java
 src/java/org/apache/hadoop/mapred/lib/db/DBOutputFormat.java
 src/java/org/apache/hadoop/mapred/FileOutputFormat.java

All public APIs with dependence on  FileSystem are left alone for compatibility (tools like oozie or pig may use these)

Most private references/APIs are migrated

Though the migration was straight forward some inconsistent usage caused problems

Example,

path.getFileSystem(conf) (no equivalent API and DFS team's recommendation is not to use this) 
getting RawFileSystem FileSystem.getRaw() - workaround 
copyTo/FromLocal - use a generalize copy() from FileContext.Util class which requires fully qualified path
fs.getUri() - not available via context (workaround)
FileSystem.closeAllforUGI() - not available yet

getFileSystemName() - is a public API in JT why this is needed  





> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components

Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------

    Attachment: mapred-2020-10.patch

Revised and expanded

Incorporate  previous review comments

Cleanup most private APIs to use only fileContext

Following components have not been fully (none or partial) migrated

TaskTracker
JobHistory (partial)
MapTask (partial - references to getRecordWriter)
ReduceTask(partial - skipWriter)
OutputFormat  and implementations(partial)

Security (ugi) changes for JT are not in this patch - will update

> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020-10.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.