You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "steveloughran (via GitHub)" <gi...@apache.org> on 2023/02/17 19:19:39 UTC

[GitHub] [hadoop] steveloughran opened a new pull request, #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

steveloughran opened a new pull request, #5412:
URL: https://github.com/apache/hadoop/pull/5412

   
   
   Even though DiskChecker.mkdirsWithExistsCheck() will create the directory tree, it is only called *after* the enumeration of directories with available space has completed.
   Directories which don't exist are reported as having 0 space, therefore the mkdirs code is never reached.
   Adding a simple mkdirs() -without bothering to check the outcome- ensures that if a dir has been deleted then it will be reconstructed if possible. If it can't it will still have 0 bytes of space reported and so be excluded from the allocation.
   
   No need to call exists() first as that is what mkdirs() does first anyway.
   
   
   ### How was this patch tested?
   
   new tests, doing the s3a and abfs runs as diligence
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran merged pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran merged PR #5412:
URL: https://github.com/apache/hadoop/pull/5412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1438920330

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 17s |  |  trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  20m 26s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 30s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 24s |  |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  22m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 30s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 39s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 15s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 220m 17s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5412 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux cfad1a517cb2 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 236adfefd67b469cb0bab443e5422fdb3b582d39 |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/2/testReport/ |
   | Max. process+thread count | 1291 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439805659

   to completely clear: that temp dir creation already happens on the first invocation of getLocalPathForWrite, this just adds it to every call (this is cheap if the path exists()).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1438609730

   spotbugs was complaining that mkdirs() result was ignored. logging at debug to get it to SFTU


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1435380563

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 35s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  51m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 14s |  |  trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  20m 39s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 16s |  |  trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 35s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 37s |  |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  22m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 40s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 41s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 44s | [/new-spotbugs-hadoop-common-project_hadoop-common.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/1/artifact/out/new-spotbugs-hadoop-common-project_hadoop-common.html) |  hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  25m 36s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 24s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 226m 49s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | SpotBugs | module:hadoop-common-project/hadoop-common |
   |  |  Exceptional return value of java.io.File.mkdirs() ignored in org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(String, long, Configuration, boolean)  At LocalDirAllocator.java:ignored in org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(String, long, Configuration, boolean)  At LocalDirAllocator.java:[line 422] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5412 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a8412891df6f 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7908fe9bccc68221952a266f508863606f638d1d |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/1/testReport/ |
   | Max. process+thread count | 1291 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5412/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Re: [PR] HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion [hadoop]

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1954460355

   well, add a new JIRW with fix then. thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439803446

   Currently `LocalDirAllocator.getLocalPathForWrite()` will create all target dirs if absent on first invocation or if the config is changed via `confChanged()`
   
   On later calls, checkWrite is called after a dir has been selected. This method contains both parent dir recreation *and* explicit attempt to create a file under it.
   
   But the parent dir creation is too late for the disk space allocator, which will have already got 0 as the available space if a dir doesn't exist.
   
   What does this mean? It means that long-lived LocalDirAllocator instances cannot recover from the deletion of temp directories, even though checkWrite is meant to be able to do so. Calling mkdirs() earlier *and ignoring the result* means that if hadoop.tmp.dir has been deleted there will be a best effort attempt to recreate it before looking for free disk space. By not checking at the return code we are avoiding changing where things may fail and all the complications that might bring, so it is not a perfect fix. It's just a lot simpler than something more sophisticated like handling the special case where there are no free directories buy triggering a whole new confChanged()-style rescan in the hope that will bring back a directory or even disk volume.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5412: HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439886887

   thanks. That checkwrite is really a safety check about whether a dir is *still* writable; the allocator precreates all parent dirs whenever the config changes
   
   merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Re: [PR] HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion [hadoop]

Posted by "lowezheng (via GitHub)" <gi...@apache.org>.
lowezheng commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1954058932

   ${hadoop.tmp.dir} is set to "/tmp/hadoop-root" by default. This path can be deleted by the operating system or manually. This will cause the above problems. I think the same logic needs to be added to the screenshot below in the code. in version 3.3.6
   
   <img width="1041" alt="image" src="https://github.com/apache/hadoop/assets/1687284/6c687c7f-77b7-4f14-bcda-baa62cdc88b3">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org