You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/10/20 20:38:09 UTC

[GitHub] [hadoop] steveloughran opened a new pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

steveloughran opened a new pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399


   
   All S3A committers can have purging pending deletes on job commit
   disabled. (new option; old one deprecated).
   
   More logging of what is going on with individual file load/upload/commit
   (including with duration)
   
   Test of concurrent jobs designed to trigger the specific failure conditions
   of Staging (job2 commit after task1 commit) and Magic (job2 commit before
   task2 commit)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-729115191


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 11s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 10s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 38s |  |  trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |  18m 13s |  |  trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 53s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 48s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 11s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 27s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 42s |  |  the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |  20m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m  5s |  |  the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m  5s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 46s |  |  root: The patch generated 0 new + 48 unchanged - 1 fixed = 48 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/whitespace-eol.txt) |  The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  17m  9s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 42s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   9m 49s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 35s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 196m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 965a3f2ebeb9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a7b923c80c6 |
   | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/testReport/ |
   | Max. process+thread count | 1331 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522957970



##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##########
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * <p></p>
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * <p></p>
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+    describe("Run two jobs to the same destination, assert they both complete");
+    Configuration conf = getConfiguration();
+    conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+    // this job has a job ID generated and set as the spark UUID;
+    // the config is also set to require it.
+    // This mimics the Spark setup process.
+
+    String stage1Id = UUID.randomUUID().toString();
+    conf.set(SPARK_WRITE_UUID, stage1Id);
+    conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+    // create the job and write data in its task attempt
+    JobData jobData = startJob(true);
+    Job job1 = jobData.job;
+    AbstractS3ACommitter committer1 = jobData.committer;
+    JobContext jContext1 = jobData.jContext;
+    TaskAttemptContext tContext1 = jobData.tContext;
+    Path job1TaskOutputFile = jobData.writtenTextPath;
+
+    // the write path
+    Assertions.assertThat(committer1.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer1)
+        .contains(stage1Id);
+    // now build up a second job
+    String jobId2 = randomJobId();
+
+    // second job will use same ID
+    String attempt2 = taskAttempt0.toString();
+    TaskAttemptID taskAttempt2 = taskAttempt0;
+
+    // create the second job
+    Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+    c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+    Job job2 = newJob(outDir,
+        c2,
+        attempt2);
+    Configuration conf2 = job2.getConfiguration();

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522096138



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
##########
@@ -118,15 +114,14 @@ public StagingCommitter(Path outputPath,
     Configuration conf = getConf();
     this.uploadPartSize = conf.getLongBytes(
         MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE);
-    this.uuid = getUploadUUID(conf, context.getJobID());
     this.uniqueFilenames = conf.getBoolean(
         FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES,
         DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES);
-    setWorkPath(buildWorkPath(context, uuid));
+    setWorkPath(buildWorkPath(context, this.getUUID()));

Review comment:
       relic of wrapping/pulling up the old code. Fixed. Also clarified the uuid javadocs now that SPARK-33402 is generating more unique job IDs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717513018


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  7s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 8 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  11m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 19s |  |  trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |  17m 59s |  |  trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 49s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  2s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  3s |  |  trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 22s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 42s |  |  the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javac  |  20m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m 11s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 46s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 8 new + 30 unchanged - 1 fixed = 38 total (was 31)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/whitespace-eol.txt) |  The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  16m 37s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | -1 :x: |  javadoc  |   0m 34s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt) |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 generated 6 new + 88 unchanged - 0 fixed = 94 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   3m 41s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  |   9m 42s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 36s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 189m 42s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.ha.TestZKFailoverController |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 9c5c2f56155c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ae74407ac43 |
   | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/testReport/ |
   | Max. process+thread count | 3233 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522961040



##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##########
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * <p></p>
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * <p></p>
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+    describe("Run two jobs to the same destination, assert they both complete");
+    Configuration conf = getConfiguration();
+    conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+    // this job has a job ID generated and set as the spark UUID;
+    // the config is also set to require it.
+    // This mimics the Spark setup process.
+
+    String stage1Id = UUID.randomUUID().toString();
+    conf.set(SPARK_WRITE_UUID, stage1Id);
+    conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+    // create the job and write data in its task attempt
+    JobData jobData = startJob(true);
+    Job job1 = jobData.job;
+    AbstractS3ACommitter committer1 = jobData.committer;
+    JobContext jContext1 = jobData.jContext;
+    TaskAttemptContext tContext1 = jobData.tContext;
+    Path job1TaskOutputFile = jobData.writtenTextPath;
+
+    // the write path
+    Assertions.assertThat(committer1.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer1)
+        .contains(stage1Id);
+    // now build up a second job
+    String jobId2 = randomJobId();
+
+    // second job will use same ID
+    String attempt2 = taskAttempt0.toString();
+    TaskAttemptID taskAttempt2 = taskAttempt0;
+
+    // create the second job
+    Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+    c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+    Job job2 = newJob(outDir,
+        c2,
+        attempt2);
+    Configuration conf2 = job2.getConfiguration();
+    conf2.set("mapreduce.output.basename", "task2");
+    String stage2Id = UUID.randomUUID().toString();
+    conf2.set(SPARK_WRITE_UUID,
+        stage2Id);
+
+    JobContext jContext2 = new JobContextImpl(conf2,
+        taskAttempt2.getJobID());
+    TaskAttemptContext tContext2 =
+        new TaskAttemptContextImpl(conf2, taskAttempt2);
+    AbstractS3ACommitter committer2 = createCommitter(outDir, tContext2);
+    Assertions.assertThat(committer2.getJobAttemptPath(jContext2))
+        .describedAs("Job attempt path of %s", committer2)
+        .isNotEqualTo(committer1.getJobAttemptPath(jContext1));
+    Assertions.assertThat(committer2.getTaskAttemptPath(tContext2))
+        .describedAs("Task attempt path of %s", committer2)
+        .isNotEqualTo(committer1.getTaskAttemptPath(tContext1));
+    Assertions.assertThat(committer2.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer2)
+        .isNotEqualTo(committer1.getWorkPath().toString())
+        .contains(stage2Id);
+    Assertions.assertThat(committer2.getUUIDSource())
+        .describedAs("UUID source of %s", committer2)
+        .isEqualTo(AbstractS3ACommitter.JobUUIDSource.SparkWriteUUID);
+    JobData jobData2 = new JobData(job2, jContext2, tContext2, committer2);
+    setup(jobData2);
+    abortInTeardown(jobData2);
+
+    // the sequence is designed to ensure that job2 has active multipart
+    // uploads during/after job1's work
+
+    // if the committer is a magic committer, MPUs start in the write,
+    // otherwise in task commit.
+    boolean multipartInitiatedInWrite =
+        committer2 instanceof MagicS3GuardCommitter;
+
+    // job2. Here we start writing a file and have that write in progress
+    // when job 1 commits.
+
+    LoggingTextOutputFormat.LoggingLineRecordWriter<Object, Object>
+        recordWriter2 = new LoggingTextOutputFormat<>().getRecordWriter(
+            tContext2);
+
+    LOG.info("Commit Task 1");
+    commitTask(committer1, tContext1);
+
+    if (multipartInitiatedInWrite) {
+      // magic committer runs -commit job1 while a job2 TA has an open
+      // writer (and hence: open MP Upload)
+      LOG.info("Commit Job 1");

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728070916


   Pushed up an iteration with all the feedback addressed
   
   testing: s3 london, unguarded, markers=keep
   downstream testing (which now includes a test to generate 10K Job IDs through the spark API and verify they are different): s3 ireland, unguarded, markers = delete


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] dongjoon-hyun commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730470829


   Thank you, @steveloughran and guys!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 44s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 55s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  27m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  25m 20s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 52s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 14s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 13s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 36s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m  6s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 37s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  24m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 13s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  20m 13s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 53s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/whitespace-eol.txt) |  The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  16m 51s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   4m 44s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  10m 44s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 37s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 215m 31s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 667abdab7b52 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4331c88352d |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/testReport/ |
   | Max. process+thread count | 1327 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948661



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -147,6 +173,11 @@ protected AbstractS3ACommitter(
     this.jobContext = context;
     this.role = "Task committer " + context.getTaskAttemptID();
     setConf(context.getConfiguration());
+    Pair<String, JobUUIDSource> id = buildJobUUID(
+        conf, context.getJobID());
+    uuid = id.getLeft();
+    uuidSource = id.getRight();

Review comment:
       Makes sense in the constructor. Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 56s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 16s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  20m 17s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 31s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 14s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  25m 39s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  25m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 42s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  22m 42s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   3m 23s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt) |  The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  3s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  18m 16s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | -1 :x: |  javadoc  |   0m 38s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt) |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 88 unchanged - 0 fixed = 92 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   4m 18s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  |  11m  6s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 48s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 224m 49s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.security.TestLdapGroupsMapping |
   |   | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 54e02b38280c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / deea5d8f2ba |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/testReport/ |
   | Max. process+thread count | 3233 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-726048699


   thanks. will go through comments and apply before merging


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730407746


   Merged to trunk, not yet 3.3. See #2473 for the test failure caused in code from a different PR *which this patch goes nowhere near*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720638320






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-713128664


   FYI @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522098602



##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
##########
@@ -535,20 +535,28 @@ Conflict management is left to the execution engine itself.
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 |--------|-------|-----------|-------------|---------|---------|
-| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file  at the end of each job | `true` |
+| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file on the successful completion of the job. | `true` |
+| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | `${hadoop.tmp.dir}/s3a` |
+| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support in the filesystem. | `false` |
+| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all pending uploads under the destination path when the job is committed or aborted. | `true` |
 | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for parallel operations on files. | 8 |
-| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` |
-| `fs.s3a.committer.staging.unique-filenames` |  | X | X | Generate unique filenames | `true` |
-| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support in the filesystem | `false` |
+| `fs.s3a.committer.generate.uuid` |  | X | X | Generate a Job UUID if none is passed down from Spark | `false` |
+| `fs.s3a.committer.require.uuid` |  | X | X | Require the Job UUID to be passed down from Spark | `false` |
 
 
+Staging committer (Directory and Partitioned) options
 
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 |--------|-------|-----------|-------------|---------|---------|
-| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | |
-| `fs.s3a.committer.staging.tmp.path` |  | X | X | Path in the cluster filesystem for temporary data | `tmp/staging` |
 
+| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` |

Review comment:
       done. Also reviewed both tables, removed those columns about which committer supports what option, now they are split into common and staging




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-713203098


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 18s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 2 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  10m 41s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m  3s |  |  trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |  18m 17s |  |  trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 13s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 12s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 28s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 40s |  |  the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javac  |  20m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 14s |  |  the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m 14s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 44s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/1/artifact/out/whitespace-eol.txt) |  The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  16m 21s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | -1 :x: |  javadoc  |   0m 35s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/1/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt) |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 generated 4 new + 88 unchanged - 0 fixed = 92 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   3m 41s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  10m 23s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 20s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 190m 34s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 50acdedcd6d6 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7b4359657f8 |
   | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/1/testReport/ |
   | Max. process+thread count | 2113 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/1/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522093816



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java
##########
@@ -68,7 +68,7 @@
   /**
    * Serialization ID: {@value}.
    */
-  private static final long serialVersionUID = 507133045258460084L;
+  private static final long serialVersionUID = 507133045258460083L + VERSION;

Review comment:
       This is only for java serialization, obviously. It's to make sure anyone (me) who might pass them around in spark RDDs won't create serlalization problems. FWIW I use the JSON format in those cloud committer tests, primarily to verify the committer name correctness




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522947215



##########
File path: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
##########
@@ -1925,20 +1925,13 @@
 </property>
 
 <property>
-  <name>fs.s3a.committer.staging.abort.pending.uploads</name>
+  <name>fs.s3a.committer.abort.pending.uploads</name>
   <value>true</value>
   <description>
-    Should the staging committers abort all pending uploads to the destination
+    Should the committers abort all pending uploads to the destination
     directory?
 
-    Changing this if more than one partitioned committer is
-    writing to the same destination tree simultaneously; otherwise
-    the first job to complete will cancel all outstanding uploads from the
-    others. However, it may lead to leaked outstanding uploads from failed
-    tasks. If disabled, configure the bucket lifecycle to remove uploads
-    after a time period, and/or set up a workflow to explicitly delete
-    entries. Otherwise there is a risk that uncommitted uploads may run up
-    bills.
+    Set to false if more than one job is writing to the same directory tree.

Review comment:
       taskAbort, yet. JobAbort/cleanup is where things are more trouble, because the job doesn't know what specific task attempts have uploaded.
   
   with the staging committer, there's no files uploaded until task commit. Tasks which fail before that moment don't have any pending uploads to cancel. 
   with the magic committer, because the files are written direct to S3, there is more risk of pending uploads collecting. 
   
   I'm not sure about spark here, but on MR when a task is considered to have failed, abortTask is called in the AM to abort that specific task; for the magic committer the task's set of .pending files is determined by listing the task attempt dir, and those operations cancelled. If that operation is called reliably, only the current upload is pending. 
   
   Of course, if an entire job fails: no cleanup at all.
   
   The best thing to do is simply to tell everyone to have a scheduled cleanup.
   
   FWIW, the most leakage I see in the real world is actually from incomplete S3ABlockOutputStream writes as again, they accrue bills. Everyone needs a lifecycle rule to delete old ones. The sole exception there is one which our QE team used which (unknown to them) I'd use for testing the scalability of the "hadoop s3guard uploads" command -how well does it work when there are many, many incomplete uploads, can it still delete them all etc. If they had a rule then it'd screw up my test runs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mehakmeet commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

mehakmeet commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r521770766



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>

Review comment:
       nit: Would this be "If disabled"? Also, what is the property we are talking about that is enabled or not, is it FS_S3A_COMMITTER_GENERATE_UUID, then we should mention it here too I think.

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>
+   * </ol>
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * <p>
+   *   If the option {@link CommitConstants#FS_S3A_COMMITTER_REQUIRE_UUID}
+   *   is set, then an external UUID MUST be passed in.
+   *   This can be used to verify that the spark engine is reliably setting
+   *   unique IDs for staging.
+   * </p>
+   * @param conf job/task configuration
+   * @param jobId job ID from YARN or spark.
+   * @return Job UUID and source of it.
+   * @throws PathCommitException no UUID was found and it was required
+   */
+  public static Pair<String, JobUUIDSource>
+      buildJobUUID(Configuration conf, JobID jobId)
+      throws PathCommitException {
+
+    String jobUUID = conf.getTrimmed(FS_S3A_COMMITTER_UUID, "");
+
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.CommitterUUIDProperty);
+    }
+    // there is no job UUID.
+    // look for one from spark
+    jobUUID = conf.getTrimmed(SPARK_WRITE_UUID, "");
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.SparkWriteUUID);
+    }
+
+    // there is no UUID configuration in the job/task config
+
+    // Check the job hasn't declared a requirement for the UUID.
+    // This allows or fail-fast validation of Spark behavior.
+    if (conf.getBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, false)) {
+      throw new PathCommitException("", E_NO_SPARK_UUID);
+    }
+
+    // see if the job can generate a random UUID
+    if (conf.getBoolean(FS_S3A_COMMITTER_GENERATE_UUID, false)) {
+      // generate a random UUID. This is OK for a job, for a task
+      // it means that the data may not get picked up.
+      String newId = UUID.randomUUID().toString();
+      LOG.warn("No job ID in configuration; generating a randem ID: {}",

Review comment:
       nit: typo in "random"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724023990


   Test run with: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo  -Dfs.s3a.directory.marker.audit=true -Dscale
   
   ```
   [INFO] 
   [ERROR] Failures: 
   [ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<93>
   [ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferOnClosedFile:83->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<605>
   [INFO] 
   [ERROR] Tests run: 1379, Failures: 2, Errors: 0, Skipped: 153
   [INFO] 
   ```
   
   My next big bit of work is to do tests in spark itself


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-727939820


   @rdblue
   
   yes, I did a bit more than was needed because I had to also let > 1 magic committer commit work side-by-side (all that active upload warning), and the IDE was trying to keep me in check too, on a piece of code which hasn't been revisited for a while.
   
   While I had the files open in the IDE, I moved to passing FileStatus down to line up with the changes in #2168 -if you open a file through the JsonSerializer by passing in the FileStatus, that will be handed off to the FileSystem's implementation of openFile(status.path).withFileStatus(status), and so be used by S3A FS to skip the initial HEAD request. Means if we are reading 1000 .pendingset files in S3A, we eliminate 1000 HEAD calls, which should have tangible benefits for committers using S3 as the place to keep those files. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044


   Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in [cloud-integration](https://github.com/hortonworks-spark/cloud-integration)
   
   ```
   S3AParquetPartitionSuite:
   2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitter (AbstractS3ACommitter.java:<init>(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(141)) - File Output Committer Algorithm version is 1
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536_0000_m_000000_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartS
 ize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_0000}; taskId=attempt_20201109105536_0000_m_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND
   2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3AC
   ```
   
   1. Spark is passing down a unique job ID (committer is configured to require it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID`
   1. This used for the local fs work path of the staging committer `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0,`  
   1. And for the cluster FS (which is file:// here)
   `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728238352


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  34m  5s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  2s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  27m 52s |  |  trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |  24m 10s |  |  trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  27m  1s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m 43s |  |  trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 42s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 40s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 37s |  |  the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |  23m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 12s |  |  the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m 12s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 46s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/whitespace-eol.txt) |  The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  16m 58s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 43s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   9m 43s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 35s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 257m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 4ee065eae8d6 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dd85a90da6f |
   | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/testReport/ |
   | Max. process+thread count | 3085 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-713125882


   Tested OK with -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo  -Dfs.s3a.directory.marker.audit=true -Dscale
   
   now retesting with delete and unguarded


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus removed a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 56s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 16s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  20m 17s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 31s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 14s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  25m 39s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  25m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 42s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  22m 42s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   3m 23s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt) |  The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  3s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  18m 16s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | -1 :x: |  javadoc  |   0m 38s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt) |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 88 unchanged - 0 fixed = 92 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   4m 18s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  |  11m  6s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 48s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 224m 49s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.security.TestLdapGroupsMapping |
   |   | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 54e02b38280c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / deea5d8f2ba |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/testReport/ |
   | Max. process+thread count | 3233 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724786879


   some more detail for the watchers from my testing (hadoop-trunk + CDP spark 2.4). I could not get spark master and hadoop trunk to build together this week.
   
   * RDD.saveAs needs to pass down the setting too [https://issues.apache.org/jira/browse/SPARK-33402](https://issues.apache.org/jira/browse/SPARK-33402)
   * I'm getting errors with FileSystem instantiation in Hive and the isolated classloader [https://issues.apache.org/jira/browse/HADOOP-17372](https://issues.apache.org/jira/browse/HADOOP-17372). 
   
   I'm not going near that other than to add a para in troubleshooting.md saying "you're in classloader hell". Will need to be testing against spark master before worrying about WTF is going on there
   
   I'm also now worried that if anyone does >1 job with the same dest dir and overwrite=true, then there's a risk that you get the same duplicate app attempt ID race condition. It's tempting just to do something ambitious like use a random number to generate a timestamp for the cluster launch, or some random(year-month-day)+ seconds-of-day, so that this problem goes away almost completely


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522089565



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##########
@@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path,
   @Retries.RetryTranslated
   public UploadPartResult uploadPart(UploadPartRequest request)
       throws IOException {
-    return retry("upload part",
+    return retry("upload part #" + request.getPartNumber()
+        + " upload "+ request.getUploadId(),

Review comment:
       This is the S3 multipart upload ID, so I'll use upload ID for it...its also used in BlockOutputStream




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r523130440



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext context,
    * be deleted; creating it now ensures there is something at the end
    * while the job is in progress -and if nothing is created, that
    * it is still there.
+   * <p>
+   *   The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}
+   *   is set to the job UUID; if generated locally
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched.
+   *   The field {@link #jobSetup} is set to true to note that
+   *   this specific committer instance was used to set up a job.
+   * </p>
    * @param context context
    * @throws IOException IO failure
    */
 
   @Override
   public void setupJob(JobContext context) throws IOException {
-    try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) {
+    try (DurationInfo d = new DurationInfo(LOG,
+        "Job %s setting up", getUUID())) {
+      // record that the job has been set up
+      jobSetup = true;
+      // patch job conf with the job UUID.
+      Configuration c = context.getConfiguration();
+      c.set(FS_S3A_COMMITTER_UUID, this.getUUID());
+      if (getUUIDSource() == JobUUIDSource.GeneratedLocally) {
+        // we set the UUID up locally. Save it back to the job configuration
+        c.set(SPARK_WRITE_UUID, this.getUUID());

Review comment:
       I was just trying to be rigorous. will roll back. While I'm there I think I'll add the source attribute -i can then probe for it in the tests. I'm already saving it in the _SUCCESS file

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
##########
@@ -248,6 +247,47 @@ As an example, the endpoint for S3 Frankfurt is `s3.eu-central-1.amazonaws.com`:
 </property>
 ```
 
+### `Class does not implement AWSCredentialsProvider`

Review comment:
       going to add that specific bit about spark hive classloaders here too, which is where this is coming from

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>
+   * </ol>
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * <p>
+   *   If the option {@link CommitConstants#FS_S3A_COMMITTER_REQUIRE_UUID}
+   *   is set, then an external UUID MUST be passed in.
+   *   This can be used to verify that the spark engine is reliably setting
+   *   unique IDs for staging.
+   * </p>
+   * @param conf job/task configuration
+   * @param jobId job ID from YARN or spark.
+   * @return Job UUID and source of it.
+   * @throws PathCommitException no UUID was found and it was required
+   */
+  public static Pair<String, JobUUIDSource>
+      buildJobUUID(Configuration conf, JobID jobId)
+      throws PathCommitException {
+
+    String jobUUID = conf.getTrimmed(FS_S3A_COMMITTER_UUID, "");
+
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.CommitterUUIDProperty);
+    }
+    // there is no job UUID.
+    // look for one from spark
+    jobUUID = conf.getTrimmed(SPARK_WRITE_UUID, "");
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.SparkWriteUUID);
+    }
+
+    // there is no UUID configuration in the job/task config
+
+    // Check the job hasn't declared a requirement for the UUID.
+    // This allows or fail-fast validation of Spark behavior.
+    if (conf.getBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, false)) {
+      throw new PathCommitException("", E_NO_SPARK_UUID);
+    }
+
+    // see if the job can generate a random UUID
+    if (conf.getBoolean(FS_S3A_COMMITTER_GENERATE_UUID, false)) {

Review comment:
       MR jobs where their updated config doesn't get through to the tasks. Use a self-generated ID and things won't work. And as they know that the app ID is unique on that yarn cluster, that's all they need.
   
   For my spark integration tests I turned off auto generate and enabled the fail-on-job-ID option, to verify that all operations (RDD, dataframe, dataset, sql) were passing down the spark.sql option. Helped me find out where it wasn't being set

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>
+   * </ol>
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * <p>
+   *   If the option {@link CommitConstants#FS_S3A_COMMITTER_REQUIRE_UUID}
+   *   is set, then an external UUID MUST be passed in.
+   *   This can be used to verify that the spark engine is reliably setting
+   *   unique IDs for staging.
+   * </p>
+   * @param conf job/task configuration
+   * @param jobId job ID from YARN or spark.
+   * @return Job UUID and source of it.
+   * @throws PathCommitException no UUID was found and it was required
+   */
+  public static Pair<String, JobUUIDSource>
+      buildJobUUID(Configuration conf, JobID jobId)
+      throws PathCommitException {
+
+    String jobUUID = conf.getTrimmed(FS_S3A_COMMITTER_UUID, "");
+
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.CommitterUUIDProperty);
+    }
+    // there is no job UUID.
+    // look for one from spark
+    jobUUID = conf.getTrimmed(SPARK_WRITE_UUID, "");
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.SparkWriteUUID);

Review comment:
       Removed it there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522091672



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java
##########
@@ -264,4 +283,25 @@ private CommitConstants() {
   /** Extra Data key for task attempt in pendingset files. */
   public static final String TASK_ATTEMPT_ID = "task.attempt.id";
 
+  /**
+   * Require the spark UUID to be passed down: {@value}.
+   * This is to verify that SPARK-33230 has been applied to spark, and that
+   * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is set.
+   * <p>
+   *   MUST ONLY BE SET WITH SPARK JOBS.
+   * </p>
+   */

Review comment:
       +1. adding two new constants and referring to them in the production code 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] dongjoon-hyun commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-713170304


   Thank you for pinging me, @steveloughran .
   
   cc @sunchao , too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522970306



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>

Review comment:
       added the extra details




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus removed a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus removed a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  9s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 27s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 24s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 16s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 30s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 30s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 56s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  19m 52s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   5m 24s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  11m 40s |  |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 49s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 205m 19s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ae7b00a9988 |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/testReport/ |
   | Max. process+thread count | 1327 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran closed pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran closed pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] dongjoon-hyun commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-716779896


   FYI, I merged SPARK-33230, @steveloughran .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] dongjoon-hyun commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724316222


   Thank you for sharing, @steveloughran !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522092840



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/InternalCommitterConstants.java
##########
@@ -97,4 +97,30 @@ private InternalCommitterConstants() {
   /** Error message for a path without a magic element in the list: {@value}. */
   public static final String E_NO_MAGIC_PATH_ELEMENT
       = "No " + MAGIC + " element in path";
+
+  /**
+   * The UUID for jobs: {@value}.
+   * This was historically created in Spark 1.x's SQL queries, but "went away".
+   */
+  public static final String SPARK_WRITE_UUID =
+      "spark.sql.sources.writeJobUUID";
+
+  /**
+   * The App ID for jobs: {@value}.
+   */
+  public static final String SPARK_APP_ID = "spark.app.id";

Review comment:
       Cut it. this was a very old property passed down by spark.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724791317


   latest test run against s3 london, no s3guard; markers deleted (classic config). Everything, even the flaky read() tests passed!
   
    -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete  -Dfs.s3a.directory.marker.audit=true -Dscale


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] rdblue commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522277893



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -147,6 +173,11 @@ protected AbstractS3ACommitter(
     this.jobContext = context;
     this.role = "Task committer " + context.getTaskAttemptID();
     setConf(context.getConfiguration());
+    Pair<String, JobUUIDSource> id = buildJobUUID(
+        conf, context.getJobID());
+    uuid = id.getLeft();
+    uuidSource = id.getRight();

Review comment:
       Other places use `this.` as a prefix when setting fields. I find that helpful when reading to know that an instance field is being set, vs a local variable.

##########
File path: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
##########
@@ -1925,20 +1925,13 @@
 </property>
 
 <property>
-  <name>fs.s3a.committer.staging.abort.pending.uploads</name>
+  <name>fs.s3a.committer.abort.pending.uploads</name>
   <value>true</value>
   <description>
-    Should the staging committers abort all pending uploads to the destination
+    Should the committers abort all pending uploads to the destination
     directory?
 
-    Changing this if more than one partitioned committer is
-    writing to the same destination tree simultaneously; otherwise
-    the first job to complete will cancel all outstanding uploads from the
-    others. However, it may lead to leaked outstanding uploads from failed
-    tasks. If disabled, configure the bucket lifecycle to remove uploads
-    after a time period, and/or set up a workflow to explicitly delete
-    entries. Otherwise there is a risk that uncommitted uploads may run up
-    bills.
+    Set to false if more than one job is writing to the same directory tree.

Review comment:
       Committers don't cancel just their own pending uploads?

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext context,
    * be deleted; creating it now ensures there is something at the end
    * while the job is in progress -and if nothing is created, that
    * it is still there.
+   * <p>
+   *   The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}
+   *   is set to the job UUID; if generated locally
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched.
+   *   The field {@link #jobSetup} is set to true to note that
+   *   this specific committer instance was used to set up a job.
+   * </p>
    * @param context context
    * @throws IOException IO failure
    */
 
   @Override
   public void setupJob(JobContext context) throws IOException {
-    try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) {
+    try (DurationInfo d = new DurationInfo(LOG,
+        "Job %s setting up", getUUID())) {
+      // record that the job has been set up
+      jobSetup = true;
+      // patch job conf with the job UUID.
+      Configuration c = context.getConfiguration();
+      c.set(FS_S3A_COMMITTER_UUID, this.getUUID());
+      if (getUUIDSource() == JobUUIDSource.GeneratedLocally) {
+        // we set the UUID up locally. Save it back to the job configuration
+        c.set(SPARK_WRITE_UUID, this.getUUID());

Review comment:
       It seems odd to set the Spark property. Does anything else use this?

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>
+   * </ol>
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * <p>
+   *   If the option {@link CommitConstants#FS_S3A_COMMITTER_REQUIRE_UUID}
+   *   is set, then an external UUID MUST be passed in.
+   *   This can be used to verify that the spark engine is reliably setting
+   *   unique IDs for staging.
+   * </p>
+   * @param conf job/task configuration
+   * @param jobId job ID from YARN or spark.
+   * @return Job UUID and source of it.
+   * @throws PathCommitException no UUID was found and it was required
+   */
+  public static Pair<String, JobUUIDSource>
+      buildJobUUID(Configuration conf, JobID jobId)
+      throws PathCommitException {
+
+    String jobUUID = conf.getTrimmed(FS_S3A_COMMITTER_UUID, "");
+
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.CommitterUUIDProperty);
+    }
+    // there is no job UUID.
+    // look for one from spark
+    jobUUID = conf.getTrimmed(SPARK_WRITE_UUID, "");
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.SparkWriteUUID);
+    }
+
+    // there is no UUID configuration in the job/task config
+
+    // Check the job hasn't declared a requirement for the UUID.
+    // This allows or fail-fast validation of Spark behavior.
+    if (conf.getBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, false)) {
+      throw new PathCommitException("", E_NO_SPARK_UUID);
+    }
+
+    // see if the job can generate a random UUID
+    if (conf.getBoolean(FS_S3A_COMMITTER_GENERATE_UUID, false)) {

Review comment:
       Why would a committer not want to generate a unique ID and use the job ID instead?

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -202,24 +233,24 @@ protected final void setOutputPath(Path outputPath) {
    * @return the working path.
    */
   @Override
-  public Path getWorkPath() {
+  public final Path getWorkPath() {
     return workPath;
   }
 
   /**
    * Set the work path for this committer.
    * @param workPath the work path to use.
    */
-  protected void setWorkPath(Path workPath) {
+  protected final void setWorkPath(Path workPath) {
     LOG.debug("Setting work path to {}", workPath);
     this.workPath = workPath;
   }
 
-  public Configuration getConf() {
+  public final Configuration getConf() {
     return conf;
   }
 
-  protected void setConf(Configuration conf) {
+  protected final void setConf(Configuration conf) {

Review comment:
       A lot of these changes don't seem related to the UUID change. I think it would be easier to review if only necessary changes were in this PR.

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
     }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+    List<MultipartUpload> pending;
+    try {
+      pending = getCommitOperations()
+          .listPendingUploadsUnderPath(path);
+    } catch (IOException e) {
+      LOG.debug("Failed to list uploads under {}",
+          path, e);
+      return;
+    }
+    if (!pending.isEmpty()) {
+      // log a warning
+      LOG.warn("{} active upload(s) in progress under {}",
+          pending.size(),
+          path);
+      LOG.warn("Either jobs are running concurrently"
+          + " or failed jobs are not being cleaned up");
+      // and the paths + timestamps
+      DateFormat df = DateFormat.getDateTimeInstance();
+      pending.forEach(u ->
+          LOG.info("[{}] {}",
+              df.format(u.getInitiated()),
+              u.getKey()));
+      if (shouldAbortUploadsInCleanup()) {
+        LOG.warn("This committer will abort these uploads in job cleanup");
+      }
+    }
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * <p>
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * </p>
+   * <ol>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+   *   <li>Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+   *   <li>If enabled: Self-generated uuid.</li>
+   *   <li>If not disabled: Application ID</li>
+   * </ol>
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * <p>
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * </p>
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * <p>
+   *   If the option {@link CommitConstants#FS_S3A_COMMITTER_REQUIRE_UUID}
+   *   is set, then an external UUID MUST be passed in.
+   *   This can be used to verify that the spark engine is reliably setting
+   *   unique IDs for staging.
+   * </p>
+   * @param conf job/task configuration
+   * @param jobId job ID from YARN or spark.
+   * @return Job UUID and source of it.
+   * @throws PathCommitException no UUID was found and it was required
+   */
+  public static Pair<String, JobUUIDSource>
+      buildJobUUID(Configuration conf, JobID jobId)
+      throws PathCommitException {
+
+    String jobUUID = conf.getTrimmed(FS_S3A_COMMITTER_UUID, "");
+
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.CommitterUUIDProperty);
+    }
+    // there is no job UUID.
+    // look for one from spark
+    jobUUID = conf.getTrimmed(SPARK_WRITE_UUID, "");
+    if (!jobUUID.isEmpty()) {
+      return Pair.of(jobUUID, JobUUIDSource.SparkWriteUUID);

Review comment:
       This is incorrect if this is self-generated but this method is called after `setupJob`. I think that method shouldn't set `SPARK_WRITE_UUID`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-715353793


   Related to this, I'm going include work related to [SPARK-33230](https://issues.apache.org/jira/browse/SPARK-33230)
   
   If in setupJob there's no "spark.sql.sources.writeJobUUID", a UUID will be set; staging committers can use this to be confident they are getting separate dirs for jobs even when jobIDs are the same


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus removed a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-719856113






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus removed a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951069


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 22s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 54s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 32s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m  5s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 26s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  3s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 23s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 33s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 34s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 34s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  root: The patch generated 0 new + 48 unchanged - 1 fixed = 48 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/whitespace-eol.txt) |  The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  19m 47s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   4m 34s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  12m 32s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 54s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 206m 11s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux fa46a7df8a67 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2522bf2f9b0 |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/testReport/ |
   | Max. process+thread count | 2869 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] liuml07 commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

liuml07 commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r521130866



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##########
@@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path,
   @Retries.RetryTranslated
   public UploadPartResult uploadPart(UploadPartRequest request)
       throws IOException {
-    return retry("upload part",
+    return retry("upload part #" + request.getPartNumber()
+        + " upload "+ request.getUploadId(),

Review comment:
       nit: s/upload/upload ID/
   
   I was thinking of consistent log keywords so taht for any retry log we can search "upload ID" or "commit ID"

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##########
@@ -131,6 +131,8 @@ protected WriteOperationHelper(S3AFileSystem owner, Configuration conf) {
    */
   void operationRetried(String text, Exception ex, int retries,
       boolean idempotent) {
+    LOG.info("{}: Retried {}: {}", retries, text, ex.toString());

Review comment:
       the order of parameter is wrong.

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##########
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * <p></p>
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * <p></p>
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+    describe("Run two jobs to the same destination, assert they both complete");
+    Configuration conf = getConfiguration();
+    conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+    // this job has a job ID generated and set as the spark UUID;
+    // the config is also set to require it.
+    // This mimics the Spark setup process.
+
+    String stage1Id = UUID.randomUUID().toString();
+    conf.set(SPARK_WRITE_UUID, stage1Id);
+    conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+    // create the job and write data in its task attempt
+    JobData jobData = startJob(true);
+    Job job1 = jobData.job;
+    AbstractS3ACommitter committer1 = jobData.committer;
+    JobContext jContext1 = jobData.jContext;
+    TaskAttemptContext tContext1 = jobData.tContext;
+    Path job1TaskOutputFile = jobData.writtenTextPath;
+
+    // the write path
+    Assertions.assertThat(committer1.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer1)
+        .contains(stage1Id);
+    // now build up a second job
+    String jobId2 = randomJobId();
+
+    // second job will use same ID
+    String attempt2 = taskAttempt0.toString();
+    TaskAttemptID taskAttempt2 = taskAttempt0;
+
+    // create the second job
+    Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+    c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+    Job job2 = newJob(outDir,
+        c2,
+        attempt2);
+    Configuration conf2 = job2.getConfiguration();

Review comment:
       nit: may call this `conf2` like `jobConf2` to make it a bit clearer.

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
##########
@@ -535,20 +535,28 @@ Conflict management is left to the execution engine itself.
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 |--------|-------|-----------|-------------|---------|---------|
-| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file  at the end of each job | `true` |
+| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file on the successful completion of the job. | `true` |
+| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | `${hadoop.tmp.dir}/s3a` |
+| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support in the filesystem. | `false` |
+| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all pending uploads under the destination path when the job is committed or aborted. | `true` |
 | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for parallel operations on files. | 8 |
-| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` |
-| `fs.s3a.committer.staging.unique-filenames` |  | X | X | Generate unique filenames | `true` |
-| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support in the filesystem | `false` |
+| `fs.s3a.committer.generate.uuid` |  | X | X | Generate a Job UUID if none is passed down from Spark | `false` |
+| `fs.s3a.committer.require.uuid` |  | X | X | Require the Job UUID to be passed down from Spark | `false` |
 
 
+Staging committer (Directory and Partitioned) options
 
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 |--------|-------|-----------|-------------|---------|---------|
-| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | |
-| `fs.s3a.committer.staging.tmp.path` |  | X | X | Path in the cluster filesystem for temporary data | `tmp/staging` |
 
+| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` |

Review comment:
       There is an empty line between the table header and this first row. I see github online viewer is not blessing this. Maybe we just remove LoC 552

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
##########
@@ -118,15 +114,14 @@ public StagingCommitter(Path outputPath,
     Configuration conf = getConf();
     this.uploadPartSize = conf.getLongBytes(
         MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE);
-    this.uuid = getUploadUUID(conf, context.getJobID());
     this.uniqueFilenames = conf.getBoolean(
         FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES,
         DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES);
-    setWorkPath(buildWorkPath(context, uuid));
+    setWorkPath(buildWorkPath(context, this.getUUID()));

Review comment:
       nit: just `getUUID()` without `this.`?

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java
##########
@@ -264,4 +283,25 @@ private CommitConstants() {
   /** Extra Data key for task attempt in pendingset files. */
   public static final String TASK_ATTEMPT_ID = "task.attempt.id";
 
+  /**
+   * Require the spark UUID to be passed down: {@value}.
+   * This is to verify that SPARK-33230 has been applied to spark, and that
+   * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is set.
+   * <p>
+   *   MUST ONLY BE SET WITH SPARK JOBS.
+   * </p>
+   */

Review comment:
       nit: We can make it clear in javadoc here that default value is false. Same the generate.uuid below.

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##########
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * <p></p>
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * <p></p>
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+    describe("Run two jobs to the same destination, assert they both complete");
+    Configuration conf = getConfiguration();
+    conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+    // this job has a job ID generated and set as the spark UUID;
+    // the config is also set to require it.
+    // This mimics the Spark setup process.
+
+    String stage1Id = UUID.randomUUID().toString();
+    conf.set(SPARK_WRITE_UUID, stage1Id);
+    conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+    // create the job and write data in its task attempt
+    JobData jobData = startJob(true);
+    Job job1 = jobData.job;
+    AbstractS3ACommitter committer1 = jobData.committer;
+    JobContext jContext1 = jobData.jContext;
+    TaskAttemptContext tContext1 = jobData.tContext;
+    Path job1TaskOutputFile = jobData.writtenTextPath;
+
+    // the write path
+    Assertions.assertThat(committer1.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer1)
+        .contains(stage1Id);
+    // now build up a second job
+    String jobId2 = randomJobId();
+
+    // second job will use same ID
+    String attempt2 = taskAttempt0.toString();
+    TaskAttemptID taskAttempt2 = taskAttempt0;
+
+    // create the second job
+    Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+    c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+    Job job2 = newJob(outDir,
+        c2,
+        attempt2);
+    Configuration conf2 = job2.getConfiguration();
+    conf2.set("mapreduce.output.basename", "task2");
+    String stage2Id = UUID.randomUUID().toString();
+    conf2.set(SPARK_WRITE_UUID,
+        stage2Id);
+
+    JobContext jContext2 = new JobContextImpl(conf2,
+        taskAttempt2.getJobID());
+    TaskAttemptContext tContext2 =
+        new TaskAttemptContextImpl(conf2, taskAttempt2);
+    AbstractS3ACommitter committer2 = createCommitter(outDir, tContext2);
+    Assertions.assertThat(committer2.getJobAttemptPath(jContext2))
+        .describedAs("Job attempt path of %s", committer2)
+        .isNotEqualTo(committer1.getJobAttemptPath(jContext1));
+    Assertions.assertThat(committer2.getTaskAttemptPath(tContext2))
+        .describedAs("Task attempt path of %s", committer2)
+        .isNotEqualTo(committer1.getTaskAttemptPath(tContext1));
+    Assertions.assertThat(committer2.getWorkPath().toString())
+        .describedAs("Work path path of %s", committer2)
+        .isNotEqualTo(committer1.getWorkPath().toString())
+        .contains(stage2Id);
+    Assertions.assertThat(committer2.getUUIDSource())
+        .describedAs("UUID source of %s", committer2)
+        .isEqualTo(AbstractS3ACommitter.JobUUIDSource.SparkWriteUUID);
+    JobData jobData2 = new JobData(job2, jContext2, tContext2, committer2);
+    setup(jobData2);
+    abortInTeardown(jobData2);
+
+    // the sequence is designed to ensure that job2 has active multipart
+    // uploads during/after job1's work
+
+    // if the committer is a magic committer, MPUs start in the write,
+    // otherwise in task commit.
+    boolean multipartInitiatedInWrite =
+        committer2 instanceof MagicS3GuardCommitter;
+
+    // job2. Here we start writing a file and have that write in progress
+    // when job 1 commits.
+
+    LoggingTextOutputFormat.LoggingLineRecordWriter<Object, Object>
+        recordWriter2 = new LoggingTextOutputFormat<>().getRecordWriter(
+            tContext2);
+
+    LOG.info("Commit Task 1");
+    commitTask(committer1, tContext1);
+
+    if (multipartInitiatedInWrite) {
+      // magic committer runs -commit job1 while a job2 TA has an open
+      // writer (and hence: open MP Upload)
+      LOG.info("Commit Job 1");

Review comment:
       nit: add `multipartInitiatedInWrite` to the log message? Same below

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/InternalCommitterConstants.java
##########
@@ -97,4 +97,30 @@ private InternalCommitterConstants() {
   /** Error message for a path without a magic element in the list: {@value}. */
   public static final String E_NO_MAGIC_PATH_ELEMENT
       = "No " + MAGIC + " element in path";
+
+  /**
+   * The UUID for jobs: {@value}.
+   * This was historically created in Spark 1.x's SQL queries, but "went away".
+   */
+  public static final String SPARK_WRITE_UUID =
+      "spark.sql.sources.writeJobUUID";
+
+  /**
+   * The App ID for jobs: {@value}.
+   */
+  public static final String SPARK_APP_ID = "spark.app.id";

Review comment:
       Is this SPARK app ID name constant still used, or I missed something? 🤔 

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
##########
@@ -175,48 +171,17 @@ protected void initFileOutputCommitterOptions(JobContext context) {
   public String toString() {
     final StringBuilder sb = new StringBuilder("StagingCommitter{");
     sb.append(super.toString());
+    sb.append(", commitsDirectory=").append(commitsDirectory);
+    sb.append(", uniqueFilenames=").append(uniqueFilenames);
     sb.append(", conflictResolution=").append(conflictResolution);
+    sb.append(". uploadPartSize=").append(uploadPartSize);

Review comment:
       nit: s/./,

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java
##########
@@ -68,7 +68,7 @@
   /**
    * Serialization ID: {@value}.
    */
-  private static final long serialVersionUID = 507133045258460084L;
+  private static final long serialVersionUID = 507133045258460083L + VERSION;

Review comment:
       😄 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948976



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -202,24 +233,24 @@ protected final void setOutputPath(Path outputPath) {
    * @return the working path.
    */
   @Override
-  public Path getWorkPath() {
+  public final Path getWorkPath() {
     return workPath;
   }
 
   /**
    * Set the work path for this committer.
    * @param workPath the work path to use.
    */
-  protected void setWorkPath(Path workPath) {
+  protected final void setWorkPath(Path workPath) {
     LOG.debug("Setting work path to {}", workPath);
     this.workPath = workPath;
   }
 
-  public Configuration getConf() {
+  public final Configuration getConf() {
     return conf;
   }
 
-  protected void setConf(Configuration conf) {
+  protected final void setConf(Configuration conf) {

Review comment:
       The IDE was whining about calling an override point in the constructor, so I turned it off at the same time. sorry




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  9s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch appears to include 11 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 27s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  |  trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 24s |  |  trunk passed  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 16s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 30s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 30s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 56s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt) |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  shadedclient  |  19m 52s |  |  patch has no errors when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   5m 24s |  |  the patch passed  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  11m 40s |  |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 49s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 205m 19s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ae7b00a9988 |
   | Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/testReport/ |
   | Max. process+thread count | 1327 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/console |
   | versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
   | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs slightly better.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717395438


   @dongjoon-hyun thanks...doing a bit more on this as the more tests I write, the more corner cases surface. Think I'm control now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran edited a comment on pull request #2399: HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID.

Posted by GitBox <gi...@apache.org>.

steveloughran edited a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044


   Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in [cloud-integration](https://github.com/hortonworks-spark/cloud-integration)
   
   ```
   S3AParquetPartitionSuite:
   2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitter (AbstractS3ACommitter.java:<init>(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(141)) - File Output Committer Algorithm version is 1
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536_0000_m_000000_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartS
 ize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_0000}; taskId=attempt_20201109105536_0000_m_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND
   2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3AC
   ```
   
   1. Spark is passing down a unique job ID (committer is configured to require it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID`
   1. This used for the local fs work path of the staging committer `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0,`  
   1. And for the cluster FS (which is file:// here)
   `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads`
   
   that is: spark is setting the UUID and the committer is picking it up and using as appropriate


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org