You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/06/26 12:28:00 UTC

[jira] [Commented] (HADOOP-18781) ABFS Output stream thread pools getting shutdown during GC.

    [ https://issues.apache.org/jira/browse/HADOOP-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737126#comment-17737126 ] 

ASF GitHub Bot commented on HADOOP-18781:
-----------------------------------------

mehakmeet opened a new pull request, #5780:
URL: https://github.com/apache/hadoop/pull/5780

   ### Description of PR
   Applications using AzureBlobFileSystem to create the AbfsOutputStream can use the AbfsOutputStream for the purpose of writing, however, the OutputStream doesn't hold any reference to the fs instance that created it, which can make the FS instance eligible for GC, when this occurs, AzureblobFileSystem's `finalize()` method gets called which in turn closes the FS, and in turn call the close for AzureBlobFileSystemStore, which uses the same Threadpool that is used by the AbfsOutputStream. This leads to the closing of the thread pool while the writing is happening in the background and leads to hanging while writing.
   
   ### How was this patch tested?
   `mvn -Dparallel-tests=abfs -DtestsThreadCount=8 -Dscale clean verify` on `us-west-2`
   
   ```
   [INFO] 
   [INFO] Tests run: 141, Failures: 0, Errors: 0, Skipped: 4
   ```
   
   ```
   [INFO] 
   [ERROR] Tests run: 582, Failures: 5, Errors: 1, Skipped: 107
   Seeing "lease" related errors, seems to be my bucket related(have seen the same in trunk for me too, so unrelated).
   ```
   
   ```
   [ERROR] Failures: 
   [ERROR]   ITestAbfsReadWriteAndSeek.testReadAndWriteWithDifferentBufferSizesAndSeek:78->testReadWriteAndSeek:111 [Retry was required due to issue on server side] expected:<[0]> but was:<[1]>
   [INFO] 
   [ERROR] Tests run: 339, Failures: 1, Errors: 0, Skipped: 41
   
   Seeing this in the trunk as well, not so sure about this.
   ```
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> ABFS Output stream thread pools getting shutdown during GC.
> -----------------------------------------------------------
>
>                 Key: HADOOP-18781
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18781
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Mehakmeet Singh
>            Assignee: Mehakmeet Singh
>            Priority: Major
>
> Applications using AzureBlobFileSystem to create the AbfsOutputStream can use the AbfsOutputStream for the purpose of writing, however, the OutputStream doesn't hold any reference to the fs instance that created it, which can make the FS instance eligible for GC, when this occurs, AzureblobFileSystem's `finalize()` method gets called which in turn closes the FS, and in turn call the close for AzureBlobFileSystemStore, which uses the same Threadpool that is used by the AbfsOutputStream. This leads to the closing of the thread pool while the writing is happening in the background and leads to hanging while writing.
>  
> *Solution:*
> Pass a backreference of AzureBlobFileSystem into AzureBlobFileSystemStore and AbfsOutputStream as well.
>  
> Same should be done for AbfsInputStream as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org