You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "steveloughran (via GitHub)" <gi...@apache.org> on 2023/05/24 19:13:01 UTC

[GitHub] [hadoop] steveloughran opened a new pull request, #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

steveloughran opened a new pull request, #5689:
URL: https://github.com/apache/hadoop/pull/5689

   
   * changes the default value to keep
   * doesn't log it at info
   * updated docs
   * cut all marker tool commands from the SDK qualification commands *
   
   ### How was this patch tested?
   
   * module tests in progress
   * also plan to do some CLI commands to see the log is quiet.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1573949366

   safe when you don't have incompatible clients trying to write to the same directory tree, more specifically using rename and delete, which effectively means "hive, mapreduce, spark jobs etc". the s3a committer doesn't use rename to commit work, but when spark starts a job it will delete the dest directory -so if delete() mistakes a marker at the root for an empty dir, it may not delete the old data. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1217662040


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -161,7 +176,7 @@ When a file is created under a path, the directory marker is deleted. And when a
 file is deleted, if it was the last file in the directory, the marker is
 recreated.
 
-And, historically, When a path is listed, if a marker to that path is found, *it
+And, historically, when a path is listed, if a marker to that path is found, *it
 has been interpreted as an empty directory.*

Review Comment:
   ACK.
   
   For public documentation, probably makes sense to keep it as is. It was just a little confusing when digging into the detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1214526572


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -12,35 +12,40 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Experimental: Controlling the S3A Directory Marker Behavior
+# Controlling the S3A Directory Marker Behavior
 
-This document discusses an experimental feature of the S3A
-connector since Hadoop 3.3.1: the ability to retain directory
-marker objects above paths containing files or subdirectories.
+This document discusses an performance feature of the S3A
+connector: directory markers are not deleted unless the
+client is explicitly configured to do so.
 
 ## <a name="compatibility"></a> Critical: this is not backwards compatible!
 
 This document shows how the performance of S3 I/O, especially applications
 creating many files (for example Apache Hive) or working with versioned S3 buckets can
 increase performance by changing the S3A directory marker retention policy.
 
-Changing the policy from the default value, `"delete"` _is not backwards compatible_.
+The default policy in this release of hadoop is "keep", 
+which _is not backwards compatible_ with hadoop versions
+released before 2021.
 
-Versions of Hadoop which are incompatible with other marker retention policies,
-as of August 2020.
+The compatibility table of older releases is as follows:
 
-|  Branch    | Compatible Since | Supported           |
-|------------|------------------|---------------------|
-| Hadoop 2.x |       n/a        | WONTFIX             |
-| Hadoop 3.0 |      check       | Read-only           |
-| Hadoop 3.1 |      check       | Read-only           |
-| Hadoop 3.2 |      check       | Read-only           |
-| Hadoop 3.3 |      3.3.1       | Done                |
+| Branch     | Compatible Since | Supported | Released |
+|------------|------------------|-----------|----------|
+| Hadoop 2.x | 2.10.2           | Read-only | 05/2022  |
+| Hadoop 3.0 | n/a              | WONTFIX   |          |
+| Hadoop 3.1 | n/a              | WONTFIX   |          |
+| Hadoop 3.2 | 3.2.2            | Read-only | 01/2022  |
+| Hadoop 3.3 | 3.3.1            | Done      | 01/2021  |

Review Comment:
   yeah, there was some format error already fixed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1561911820

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  6s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 39s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 24s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/1/artifact/out/blanks-eol.txt) |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 28s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not generate ASF License warnings.  |
   |  |   |  99m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5689 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint |
   | uname | Linux 967319691b4d 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8fd2f4f5cfcf5121e871f938041ab9904f6feb14 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/1/testReport/ |
   | Max. process+thread count | 577 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1214208594


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -161,7 +176,7 @@ When a file is created under a path, the directory marker is deleted. And when a
 file is deleted, if it was the last file in the directory, the marker is
 recreated.
 
-And, historically, When a path is listed, if a marker to that path is found, *it
+And, historically, when a path is listed, if a marker to that path is found, *it
 has been interpreted as an empty directory.*

Review Comment:
   (This isn't added in this PR but...) is this really true?
   
   I tried an integ test using `listFiles` on the Hadoop 3.0 code base. It seemed happy. Is it worth being specific with what will or won't make this assumption?



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -237,29 +252,19 @@ of backwards compatibility.
 There is now an option `fs.s3a.directory.marker.retention` which controls how
 markers are managed when new files are created
 
-*Default* `delete`: a request is issued to delete any parental directory markers
+1.`delete`: a request is issued to delete any parental directory markers

Review Comment:
   markdown won't like this
   
   ```suggestion
   1. `delete`: a request is issued to delete any parental directory markers
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran merged pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran merged PR #5689:
URL: https://github.com/apache/hadoop/pull/5689


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ayushtkn commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1562369779

   Just Passing by
   AFAIK
   Changing configurations default is an incompatible change and can be done only for minor release, so you can do it only for 3.4.0
   `
   Hadoop-defined properties (names and meanings) SHALL be considered [Public](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Public) and [Stable](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable). The units implied by a Hadoop-defined property MUST NOT change, even across major versions. Default values of Hadoop-defined properties SHALL be considered [Public](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Public) and [Evolving](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Evolving).
   `
   
   https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Hadoop_Configuration_Files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1214190161


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -12,35 +12,40 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Experimental: Controlling the S3A Directory Marker Behavior
+# Controlling the S3A Directory Marker Behavior
 
-This document discusses an experimental feature of the S3A
-connector since Hadoop 3.3.1: the ability to retain directory
-marker objects above paths containing files or subdirectories.
+This document discusses an performance feature of the S3A
+connector: directory markers are not deleted unless the
+client is explicitly configured to do so.
 
 ## <a name="compatibility"></a> Critical: this is not backwards compatible!
 
 This document shows how the performance of S3 I/O, especially applications
 creating many files (for example Apache Hive) or working with versioned S3 buckets can
 increase performance by changing the S3A directory marker retention policy.
 
-Changing the policy from the default value, `"delete"` _is not backwards compatible_.
+The default policy in this release of hadoop is "keep", 
+which _is not backwards compatible_ with hadoop versions
+released before 2021.
 
-Versions of Hadoop which are incompatible with other marker retention policies,
-as of August 2020.
+The compatibility table of older releases is as follows:
 
-|  Branch    | Compatible Since | Supported           |
-|------------|------------------|---------------------|
-| Hadoop 2.x |       n/a        | WONTFIX             |
-| Hadoop 3.0 |      check       | Read-only           |
-| Hadoop 3.1 |      check       | Read-only           |
-| Hadoop 3.2 |      check       | Read-only           |
-| Hadoop 3.3 |      3.3.1       | Done                |
+| Branch     | Compatible Since | Supported | Released |
+|------------|------------------|-----------|----------|
+| Hadoop 2.x | 2.10.2           | Read-only | 05/2022  |
+| Hadoop 3.0 | n/a              | WONTFIX   |          |
+| Hadoop 3.1 | n/a              | WONTFIX   |          |
+| Hadoop 3.2 | 3.2.2            | Read-only | 01/2022  |
+| Hadoop 3.3 | 3.3.1            | Done      | 01/2021  |

Review Comment:
   Thanks for updating this with the extra info.
   
   Do we know why the Hadoop webpages aren't formatting the original table? https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/directory_markers.html#The_Problem_with_Directory_Markers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1570827515

   noted. well, let's target 3.4 at the very least and tag as incompatible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ayushtkn commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1569931307

   yep, it is like that, lot of discussions and tickets around this, example: [HDFS-13505](https://issues.apache.org/jira/browse/HDFS-13505), this is also marked as incompatible and was pushed only to trunk. This comment also says the same thing (https://issues.apache.org/jira/browse/HDFS-13505?focusedCommentId=16854777&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16854777)
   
   may be the reason is like if someone had a use case like he explicitly wanted the conf to be "delete" for whatever reasons and the default value was also "delete", he didn't configure it considering the default value, now if you change it to "keep", that guy who explicitly wanted the value to be delete, he has to change and have to configure it to "delete" to preserve his old behaviour.
   
   Not against this change, just telling the generic stuff around config defaults, what I have read or know about the compat :-) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1572316027

   Thanks. I think the tests were all good but will rerun to be 100% sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1214206939


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -12,35 +12,40 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Experimental: Controlling the S3A Directory Marker Behavior
+# Controlling the S3A Directory Marker Behavior
 
-This document discusses an experimental feature of the S3A
-connector since Hadoop 3.3.1: the ability to retain directory
-marker objects above paths containing files or subdirectories.
+This document discusses an performance feature of the S3A
+connector: directory markers are not deleted unless the
+client is explicitly configured to do so.

Review Comment:
   if this PR gets updated, small, typo to fix
   
   ```suggestion
   This document discusses a performance feature of the S3A
   connector: directory markers are not deleted unless the
   client is explicitly configured to do so.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1214527663


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -161,7 +176,7 @@ When a file is created under a path, the directory marker is deleted. And when a
 file is deleted, if it was the last file in the directory, the marker is
 recreated.
 
-And, historically, When a path is listed, if a marker to that path is found, *it
+And, historically, when a path is listed, if a marker to that path is found, *it
 has been interpreted as an empty directory.*

Review Comment:
   it is for some specific codepaths which do a probe which explicitly looks for empty dirs, rm and mv in particular.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1579232711

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  4s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 40s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 24s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 13s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 10s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 103m 13s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5689 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint |
   | uname | Linux f4f59a935c3f 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1b33b71f2323508c50e543468921b0d63f953141 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/2/testReport/ |
   | Max. process+thread count | 588 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5689/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1581239126

   @dannycjones: you happy with the changes now?
   I've got Ayush's upvote already


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on a diff in pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on code in PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#discussion_r1217662352


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md:
##########
@@ -12,35 +12,40 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Experimental: Controlling the S3A Directory Marker Behavior
+# Controlling the S3A Directory Marker Behavior
 
-This document discusses an experimental feature of the S3A
-connector since Hadoop 3.3.1: the ability to retain directory
-marker objects above paths containing files or subdirectories.
+This document discusses an performance feature of the S3A
+connector: directory markers are not deleted unless the
+client is explicitly configured to do so.
 
 ## <a name="compatibility"></a> Critical: this is not backwards compatible!
 
 This document shows how the performance of S3 I/O, especially applications
 creating many files (for example Apache Hive) or working with versioned S3 buckets can
 increase performance by changing the S3A directory marker retention policy.
 
-Changing the policy from the default value, `"delete"` _is not backwards compatible_.
+The default policy in this release of hadoop is "keep", 
+which _is not backwards compatible_ with hadoop versions
+released before 2021.
 
-Versions of Hadoop which are incompatible with other marker retention policies,
-as of August 2020.
+The compatibility table of older releases is as follows:
 
-|  Branch    | Compatible Since | Supported           |
-|------------|------------------|---------------------|
-| Hadoop 2.x |       n/a        | WONTFIX             |
-| Hadoop 3.0 |      check       | Read-only           |
-| Hadoop 3.1 |      check       | Read-only           |
-| Hadoop 3.2 |      check       | Read-only           |
-| Hadoop 3.3 |      3.3.1       | Done                |
+| Branch     | Compatible Since | Supported | Released |
+|------------|------------------|-----------|----------|
+| Hadoop 2.x | 2.10.2           | Read-only | 05/2022  |
+| Hadoop 3.0 | n/a              | WONTFIX   |          |
+| Hadoop 3.1 | n/a              | WONTFIX   |          |
+| Hadoop 3.2 | 3.2.2            | Read-only | 01/2022  |
+| Hadoop 3.3 | 3.3.1            | Done      | 01/2021  |

Review Comment:
   nice, that's great then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] dannycjones commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "dannycjones (via GitHub)" <gi...@apache.org>.
dannycjones commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1573525360

   Sorry for jumping in at the last minute with these.
   
   Basically want to try and make sure users are able to reason as easily as possible about when its safe to flip over from `delete` to `keep`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1569802418

   @ayushtkn really? default values are immutable. not sure about that as a lot of things change implicitly, or for good reason "the defaults weren't good". 
   
   while the names+meanings+units are all stable, default values are public/evolving


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5689: HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep"

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5689:
URL: https://github.com/apache/hadoop/pull/5689#issuecomment-1642513305

   Update: I have a pr of this for branch-3.3 which does everything but changing the default/documenting this change
   
   #5859 
   
   this is so those releases stop warning about incompatibility and to keep the code more in sync between branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org