You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2017/12/28 03:27:00 UTC

[jira] [Commented] (MAPREDUCE-7029) FileOutputCommitter#commitTask should delete task directory

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305014#comment-16305014 ] 

Hadoop QA commented on MAPREDUCE-7029:
--------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m  0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m  8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 16s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m  1s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-7029 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12903844/MAPREDUCE-7029.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6d7f9f67ae9c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 52babbb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7262/testReport/ |
| Max. process+thread count | 417 (vs. ulimit of 5000) |
| modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7262/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FileOutputCommitter#commitTask should delete task directory
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-7029
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7029
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.8.2
>         Environment: - Google Cloud Storage (with the GCS connector: https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) for HCFS compatibility.
> - FileOutputCommitter algorithm v2.
> - Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 2.2.0.
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>         Attachments: MAPREDUCE-7029.001.patch
>
>
> I ran a Spark job that outputs thousands of parquet files (aka there are thousands of reducers), and it hung for several minutes in the driver after all tasks were complete. Here is a very simple repro of the job (to be run in a spark-shell):
> {code:scala}
> spark.range(1L << 20).repartition(1 << 14).write.save("gs://some/path")
> {code}
> Spark actually calls into Mapreduce's FileOuputCommitter. Job committing (specifically cleanupJob()) recursively deletes the job temporary directory, which is something like "gs://some/path/_temporary". If I understand correctly, on HDFS, this would be O(1), but on GCS (and every HCFS I know), this requires a full file tree walk. Deleting tens of thousands of objects in GCS takes several minutes.
> I propose that commitTask() recursively deletes its the task attempt temp directory (something like "gs://some/path/_temporary/attempt1/task1"). On HDFS, this is O(1) per task, so this is very little overhead per task. On GCS (and other HCFSs), this adds parallelism for deleting the job temp directory.
> With the attached patch, the repro above went from taking ~10 minutes to taking ~5 minutes, and task time did not significantly change.
> Side note: I found this issue with Spark, but I assume it applies to a Mapreduce job with thousands of reducers as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org