You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2020/08/10 06:19:00 UTC

[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream

    [ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174098#comment-17174098 ] 

Anoop Sam John commented on HADOOP-17038:
-----------------------------------------

As mentioned in the desc, its main adv is with HBase where mostly the reads are random short reads. HBase by default do only positional reads for get/scans.  We have a tracking mechanism in scan, where if consecutive blocks are reads by a scanner, we switch back to stream based reads(seek+ read model).   Also during scan while compaction we do stream reads means seek+ read..  In case of these long reads (specially compaction where only compaction thread working on that dedicated FileInputStream), reading at 4 MB per remote reads is very useful.  So its not that good to reduce fs.azure.read.request.size.  This reduction will help normal random row gets case but compactions will add more pressure on the FS.  Also if the same cluster is having range scans, that also might suffer. 
This is where the real pos reads make adv.  In this patch the pos read API is extended in AbfsInputStream and it will not rely on the buffer at all.  So the API is no longer synchronized. Also it will do read only the exact number of bytes being requested for.


> Support positional read in AbfsInputStream
> ------------------------------------------
>
>                 Key: HADOOP-17038
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17038
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Major
>              Labels: HBase, abfsactive
>
> Right now it will do a seek to the position , read and then seek back to the old position.  (As per the impl in the super class)
> In HBase kind of workloads we rely mostly on short preads. (like 64 KB size by default).  So would be ideal to support a pure pos read API which will not even keep the data in a buffer but will only read the required data as what is asked for by the caller. (Not reading ahead more data as per the read size config)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org