You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Sneha Vijayarajan (Jira)" <ji...@apache.org> on 2020/10/03 22:30:00 UTC

[jira] [Updated] (HADOOP-17250) ABFS: Random read perf improvement

     [ https://issues.apache.org/jira/browse/HADOOP-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sneha Vijayarajan updated HADOOP-17250:
---------------------------------------
    Summary: ABFS: Random read perf improvement  (was: ABFS: Allow random read sizes to be of buffer size)

> ABFS: Random read perf improvement
> ----------------------------------
>
>                 Key: HADOOP-17250
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17250
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.0
>            Reporter: Sneha Vijayarajan
>            Assignee: Sneha Vijayarajan
>            Priority: Major
>              Labels: abfsactive, pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next read doesn't skip by a lot and can be served by the earlier read if read was done in buffer size. As a result the job triggered a higher count of read calls and resulted in higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the jobs fared well. 
> In this Jira we try to provide a control over config on random read to be of requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org