You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/20 19:13:00 UTC
[jira] [Commented] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure

    [ https://issues.apache.org/jira/browse/HADOOP-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607370#comment-17607370 ] 

ASF GitHub Bot commented on HADOOP-18347:
-----------------------------------------

mukund-thakur opened a new pull request, #4918:
URL: https://github.com/apache/hadoop/pull/4918

   part of HADOOP-18103.
   Also introducing a config fs.s3a.vectored.active.ranged.reads to configure the maximum number of number of range reads a single input stream can have active (downloading, or queued) to the central FileSystem instance's pool of queued operations. This stops a single stream overloading the shared thread pool.
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> Restrict vectoredIO threadpool to reduce memory pressure
> --------------------------------------------------------
>
>                 Key: HADOOP-18347
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18347
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common, fs, fs/adl, fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: performance
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967
> Currently, it fetches all the ranges with unbounded threadpool. This will not cause memory pressures with standard benchmarks like TPCDS. However, when large number of ranges are present with large files, this could potentially spike up memory usage of the task. Limiting the threadpool size could reduce the memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org