You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2018/04/16 20:30:00 UTC

[jira] [Commented] (HBASE-20430) Improve store file management for non-HDFS filesystems

    [ https://issues.apache.org/jira/browse/HBASE-20430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439985#comment-16439985 ] 

Andrew Purtell commented on HBASE-20430:
----------------------------------------

I suppose this could also be handled with an enhancement to Hadoop's S3A to multiplex open files among the internal resource pools it already maintains.

> Improve store file management for non-HDFS filesystems
> ------------------------------------------------------
>
>                 Key: HBASE-20430
>                 URL: https://issues.apache.org/jira/browse/HBASE-20430
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Priority: Major
>
> HBase keeps a file open for every active store file so no additional round trips to the NameNode are needed after the initial open. HDFS internally multiplexes open files, but the Hadoop S3 filesystem implementations do not, or, at least, not as well. As the bulk of data under management increases we observe the required number of concurrently open connections will rise, and expect it will eventually exhaust a limit somewhere (the client, the OS file descriptor table or open file limits, or the S3 service).
> Initially we can simply introduce an option to close every store file after the reader has finished, and determine the performance impact. Use cases backed by non-HDFS filesystems will already have to cope with a different read performance profile. Based on experiments with the S3 backed Hadoop filesystems, notably S3A, even with aggressively tuned options simple reads can be very slow when there are blockcache misses, 15-20 seconds observed for Get of a single small row, for example. We expect extensive use of the BucketCache to mitigate in this application already. Could be backed by offheap storage, but more likely a large number of cache files managed by the file engine on local SSD storage. If misses are already going to be super expensive, then the motivation to do more than simply open store files on demand is largely absent.
> Still, we could employ a predictive cache. Where frequent access to a given store file (or, at least, its store) is predicted, keep a reference to the store file open. Can keep statistics about read frequency, write it out to HFiles during compaction, and note these stats when opening the region, perhaps by reading all meta blocks of region HFiles when opening. Otherwise, close the file after reading and open again on demand. Need to be careful not to use ARC or equivalent as cache replacement strategy as it is encumbered. The size of the cache can be determined at startup after detecting the underlying filesystem. Eg. setCacheSize(VERY_LARGE_CONSTANT) if (fs instanceof DistributedFileSystem), so we don't lose much when on HDFS still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)