You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/06/06 17:14:01 UTC

[jira] [Commented] (HADOOP-16317) ABFS: improve random read performance

    [ https://issues.apache.org/jira/browse/HADOOP-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857907#comment-16857907 ] 

Steve Loughran commented on HADOOP-16317:
-----------------------------------------

Update

The HADOOP-15229 API lets callers who switch to the openFile() API to pass in options. If you want to define a standard seek policy one with some standard options (issue: what are those standard options) then it could be shared across stores

Probable set of values
* Default: whatever the default is
* Adaptive: adapting
* sequential
* random: warn of arbitrary random IO
* columnar: columnar formats. Could map to random, but give the implementations the chance to do something even more specific for those read plans.

There is a seek option fo ropenfile and S3a. You could do one for abfs, but it'd be a lot better to have a unified one for abfs+s3a+wasb, maybe even HDFS



> ABFS: improve random read performance
> -------------------------------------
>
>                 Key: HADOOP-16317
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16317
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.2.0
>            Reporter: Da Zhou
>            Priority: Major
>
> Improving random read performance is an interesting topic. ABFS doesn't perform well when reading column format files as the process involves with many seek operations which make the readAhead no use, and if readAheadĀ is used unwisely it would lead to unnecessary data request.
> Hence creating this Jira as a reminder to track the investigation and progress of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org