You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Krishna Ramachandran (JIRA)" <ji...@apache.org> on 2010/08/20 02:09:17 UTC
[jira] Created: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Use new FileContext APIs for all mapreduce components
------------------------------------------------------
Key: MAPREDUCE-2020
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Krishna Ramachandran
Assignee: Krishna Ramachandran
Migrate mapreduce components to using improved FileContext APIs implemented in
HADOOP-4952 and
HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-1.patch
updated
Fix testcase to use FileContext
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-6.patch
Updated patch to address concerns from last review
Revised MapTask ReduceTask and MergeManager.java
Removed trailing blanks the patch introduced
Cleaning of Merge.merge has to wait to prevent breaking compat.
Removed unused members
MergeManager - removed unused private method but has to keep the constructor for now again for compat
New components migrated
All files in
src/java/org/apache/hadoop/mapreduce/filecache/
src/java/org/apache/hadoop/mapred/TaskRunner.java
src/java/org/apache/hadoop/mapred/LocalJobRunner.java
testpatch output
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 9 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020.patch
First cut. With primary focus on JobTracker, UserLogCleaner and some util classes
TaskTracker, JobHistory, CleanUpQueue and other components are "work in progress" and not part of this
Initial Goal is:
get initial feedback from mapred and hdfs
ask for enhancements/fixes from DFS where inadequate/broken
Optimize/eliminate needless RPC calls (exists() checks)
Streamline API calls (eliminate to FileSystem)
refactoring - work in progress
"ant test" did not show any regressions
testpatch output
[exec]
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907497#action_12907497 ]
Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------
Substantive:
* {{"fs.AbstractFileSystem.file.impl"}} probably should be new {{JobContext.FOO}} style
** 3 instances (at least)
* Merger.java: {{merge()}} method madness: how many do we need? already had 7; now have 14... where does it end??
* MultiFileInputFormat.java: lose {{import org.apache.hadoop.fs.FileSystem}}: not used; slows build, adds confusion
** probably ditto MultiFileSplit.java and TestMRAsyncDiskService.java
* MergeManager.java: _massive_ pile of duplicated constructor and finalMerge() code: share! (may have previously allowed in case of "short-term transition," but Hadoop API transitions are _not_ short-term => high risk of mismatch-errors in future changes; should share code where possible even for "temporary" cases, e.g., by calling private helper function from both copies)
Cosmetic:
* still adding trailing whitespace (IFile.java, Merger.java)
** if necessary, fire up vim on diff and search: /^+.* $
* still bad wraps, e.g.:
{noformat}
+ Deserializer<T> deserializer = (Deserializer<T>) factory
+ .getDeserializer(cls);
{noformat}
or lack of wrap:
{noformat}
+ job.set("fs.AbstractFileSystem.file.impl", "org.apache.hadoop.fs.local.RawLocalFs");
{noformat}
** if line starts with a period, almost guaranteed to be wrong
* avoid superfluous "this." decorations (Merger.java, MergeManager.java):
{noformat}
+ this.mapOutputsCounter = mergedMapOutputsCounter;
+ this.localFC = null;
+ this.rfc = null;
{noformat}
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904710#action_12904710 ]
Krishna Ramachandran commented on MAPREDUCE-2020:
-------------------------------------------------
Thanks Greg!
I will include these in the next rev
Krishna
On 8/30/10 6:21 PM, "Greg Roelofs (JIRA)" <ji...@apache.org> wrote:
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904479#action_12904479 ]
Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------
Looks good so far. There are some cosmetic issues and some leftover TODO items, but nothing major and only one or two medium-level items.
general:
* remove all trailing whitespace from _new_ code (old code is full of it and makes diffs huge, but no excuse for adding to problem)
* don't replicate, share/reuse (if possible) - or, if adding new-API variant, deprecate old-API variant at same time
JobTracker.java:
* @@ -1575,12 +1584,10 @@
** commented-out error message
** potential conversion of non-access exception to AccessControlException
* @@ -1599,11 +1606,11 @@
** TODO
* @@ -2380,7 +2387,7 @@
** debug converted to info
* @@ -3075,8 +3082,9 @@
** {{filecontext.mkdir(jobDir, new FsPermission(SYSTEM_DIR_PERMISSION), true);}} - prefer try/catch block over uncaught exceptions, if for no other reason than to customize error message
* @@ -4759,6 +4767,19 @@
** TODO
UserLogCleaner.java:
* @@ -59,7 +60,7 @@
** wrap at 80, and _don't_ do so at class-member dot if don't have to (ugh!) - member reference is highest-precedence operator of anything in Java/C/C++, which means it's the _least_ preferred break-point per Apache code conventions
BAD:
{noformat}
logAsyncDisk = new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf), TaskLog
.getUserLogDir().toString());
{noformat}
FAIR:
{noformat}
logAsyncDisk = new MRAsyncDiskService(
FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
GOOD:
{noformat}
logAsyncDisk =
new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
ALSO GOOD:
{noformat}
logAsyncDisk =
new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
JobHistory.java:
* @@ -180,6 +184,40 @@
** in general, avoid duplicating code: either share/reuse or, if providing new-API version (as here), deprecate old one when new one added
** blank line required between methods
* @@ -190,6 +228,15 @@
** blank line required between methods
MRAsyncDiskService.java:
* @@ -56,9 +60,13 @@
** {{FsPermission.createImmutable((short) 0700); // rwx------}} - see Localizer.PermissionsHandler.sevenZeroZero: use?
* @@ -88,7 +96,71 @@
** deprecate old ctor
** {{* be absolte paths,}} - fix spelling (both cases)
** {{this.volumes = new String[nonCanonicalVols.length];}} - ugh, lose the "this." for everything that doesn't need it (i.e., everything that's not a duplicate name of a method argument)
* @@ -98,10 +170,12 @@
** wrap per coding conventions
* @@ -114,13 +188,14 @@
** commented-out line
* @@ -246,18 +321,9 @@
** not equivalent semantics: if throw due to perms or whatever, will be converted to FileNotFoundException
* @@ -296,10 +362,10 @@
** wrap per coding conventions
MiniMRCluster.java:
* @@ -375,7 +377,7 @@
** got rid of one trailing space but left the other one??
* @@ -388,6 +390,18 @@
** don't replicate, reuse! in this case, namenode-arg configureJobConf() can call this new variant and just add extra namenode config call first (or last, whatever)
** nuke new trailing spaces
TestJobTrackerStart.java:
* @@ -29,7 +30,9 @@
** what is purpose of dual call to configureJobConf() ?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904479#action_12904479 ]
Greg Roelofs commented on MAPREDUCE-2020:
-----------------------------------------
Looks good so far. There are some cosmetic issues and some leftover TODO items, but nothing major and only one or two medium-level items.
general:
* remove all trailing whitespace from _new_ code (old code is full of it and makes diffs huge, but no excuse for adding to problem)
* don't replicate, share/reuse (if possible) - or, if adding new-API variant, deprecate old-API variant at same time
JobTracker.java:
* @@ -1575,12 +1584,10 @@
** commented-out error message
** potential conversion of non-access exception to AccessControlException
* @@ -1599,11 +1606,11 @@
** TODO
* @@ -2380,7 +2387,7 @@
** debug converted to info
* @@ -3075,8 +3082,9 @@
** {{filecontext.mkdir(jobDir, new FsPermission(SYSTEM_DIR_PERMISSION), true);}} - prefer try/catch block over uncaught exceptions, if for no other reason than to customize error message
* @@ -4759,6 +4767,19 @@
** TODO
UserLogCleaner.java:
* @@ -59,7 +60,7 @@
** wrap at 80, and _don't_ do so at class-member dot if don't have to (ugh!) - member reference is highest-precedence operator of anything in Java/C/C++, which means it's the _least_ preferred break-point per Apache code conventions
BAD:
{noformat}
logAsyncDisk = new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf), TaskLog
.getUserLogDir().toString());
{noformat}
FAIR:
{noformat}
logAsyncDisk = new MRAsyncDiskService(
FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
GOOD:
{noformat}
logAsyncDisk =
new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
ALSO GOOD:
{noformat}
logAsyncDisk =
new MRAsyncDiskService(FileContext.getLocalFSFileContext(conf),
TaskLog.getUserLogDir().toString());
{noformat}
JobHistory.java:
* @@ -180,6 +184,40 @@
** in general, avoid duplicating code: either share/reuse or, if providing new-API version (as here), deprecate old one when new one added
** blank line required between methods
* @@ -190,6 +228,15 @@
** blank line required between methods
MRAsyncDiskService.java:
* @@ -56,9 +60,13 @@
** {{FsPermission.createImmutable((short) 0700); // rwx------}} - see Localizer.PermissionsHandler.sevenZeroZero: use?
* @@ -88,7 +96,71 @@
** deprecate old ctor
** {{* be absolte paths,}} - fix spelling (both cases)
** {{this.volumes = new String[nonCanonicalVols.length];}} - ugh, lose the "this." for everything that doesn't need it (i.e., everything that's not a duplicate name of a method argument)
* @@ -98,10 +170,12 @@
** wrap per coding conventions
* @@ -114,13 +188,14 @@
** commented-out line
* @@ -246,18 +321,9 @@
** not equivalent semantics: if throw due to perms or whatever, will be converted to FileNotFoundException
* @@ -296,10 +362,10 @@
** wrap per coding conventions
MiniMRCluster.java:
* @@ -375,7 +377,7 @@
** got rid of one trailing space but left the other one??
* @@ -388,6 +390,18 @@
** don't replicate, reuse! in this case, namenode-arg configureJobConf() can call this new variant and just add extra namenode config call first (or last, whatever)
** nuke new trailing spaces
TestJobTrackerStart.java:
* @@ -29,7 +30,9 @@
** what is purpose of dual call to configureJobConf() ?
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-4.patch
Updated patch (still preliminary):
More classes use FileContext:
MapTask
ReduceTask
+ their dependencies
MultiFileInputFormat and
MultiFileSplit
+ additonal fixes from previous review (Greg)
Work In Progress:
refactor
BlockLocation API usage (no FIleContext)
SequenceFileInputFormat
Add test reports (ant test and ant testpatch)
Following will not be migrated for now:
SequenceFile.createWriter in SkipFile module
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905649#action_12905649 ]
Krishna Ramachandran commented on MAPREDUCE-2020:
-------------------------------------------------
Comments from Chris D
> Quick update:
> Am working on Map/ReduceTask migration. There are couple potential
> blockers for a full migration
> • MapTask uses RawFileSystem in some places on local files which
> likely saves checksum overheads. But there is no convenient way of
> getting a handle if we use LocalFileContext - talking to hdfs folks
> on this.
> • Near term we may have to move away from RawFileSystem use
> LocalFileSystem
Absolutely not. The spill files require chunking of the checksum,
which (AFAIK) is not available from LocalFileSystem APIs. We don't
require redundant checksumming and- given that this is in an inner
loop- ought not to tolerate it.
> • may not be that bad as crc opertations are native and should be
> fast (even with Raw MapTask enables some approximate checksum using
> ChecksumFileSystem)
The native code is an overhead, not an advantage. Nicholas rewrote the
checksum code in HADOOP-6166 to speed this up in HDFS and MR. AFAIK,
MapTask does not use ChecksumFileSystem anywhere in the data path...
it might use it for ancillary files somewhere, but the spill, merge,
and output serving should use the raw FS exclusively. -C
> • sequenceFile.createWriter is hadoop utility and takes FileSystem
> as a parameter.
>
> I have a cleaner solution for exists() - no longer nneds a workaround.
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-11.patch
Updated
Use FileContext for most OutputFormat and subclasses
There are a few outstanding issues to be addressed by DFS team - will be tracked using separate JIRAs
Major ones:
1) Security (AccessControl using UGI)
2) Enable ProgressReport while creating output stream
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-10.patch, mapred-2020-11.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-5.patch
Fix for MultiFileSplit
Use FileContext to get BlockLocations
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-7.patch
Updated patch with more components using FileContext APIs
Changes since last rev (so only need to focus on these)
Updated src/java/org/apache/hadoop/mapred/JobTracker.java (due to trunk changes)
filecache
---------
src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCache.java
src/java/org/apache/hadoop/mapreduce/filecache/DistributedCache.java
src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java
Job Client components
--------------------------
src/java/org/apache/hadoop/mapreduce/Cluster.java
src/java/org/apache/hadoop/mapreduce/Job.java
src/java/org/apache/hadoop/mapreduce/JobSubmitter.java
Others
----------
src/java/org/apache/hadoop/mapred/JobInProgress.java
src/java/org/apache/hadoop/mapred/LocalJobRunner.java
src/java/org/apache/hadoop/mapred/lib/FilterOutputFormat.java
src/java/org/apache/hadoop/mapred/lib/NullOutputFormat.java
src/java/org/apache/hadoop/mapred/lib/db/DBOutputFormat.java
src/java/org/apache/hadoop/mapred/FileOutputFormat.java
All public APIs with dependence on FileSystem are left alone for compatibility (tools like oozie or pig may use these)
Most private references/APIs are migrated
Though the migration was straight forward some inconsistent usage caused problems
Example,
path.getFileSystem(conf) (no equivalent API and DFS team's recommendation is not to use this)
getting RawFileSystem FileSystem.getRaw() - workaround
copyTo/FromLocal - use a generalize copy() from FileContext.Util class which requires fully qualified path
fs.getUri() - not available via context (workaround)
FileSystem.closeAllforUGI() - not available yet
getFileSystemName() - is a public API in JT why this is needed
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2020) Use new FileContext APIs for all
mapreduce components
Posted by "Krishna Ramachandran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krishna Ramachandran updated MAPREDUCE-2020:
--------------------------------------------
Attachment: mapred-2020-10.patch
Revised and expanded
Incorporate previous review comments
Cleanup most private APIs to use only fileContext
Following components have not been fully (none or partial) migrated
TaskTracker
JobHistory (partial)
MapTask (partial - references to getRecordWriter)
ReduceTask(partial - skipWriter)
OutputFormat and implementations(partial)
Security (ugi) changes for JT are not in this patch - will update
> Use new FileContext APIs for all mapreduce components
> ------------------------------------------------------
>
> Key: MAPREDUCE-2020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Krishna Ramachandran
> Assignee: Krishna Ramachandran
> Attachments: mapred-2020-1.patch, mapred-2020-10.patch, mapred-2020-4.patch, mapred-2020-5.patch, mapred-2020-6.patch, mapred-2020-7.patch, mapred-2020.patch
>
>
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and
> HADOOP-6223
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.