You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/08/15 01:05:18 UTC

[jira] [Commented] (HBASE-6572) Tiered HFile storage

    [ https://issues.apache.org/jira/browse/HBASE-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097880#comment-14097880 ] 

Andrew Purtell commented on HBASE-6572:
---------------------------------------

It looks like under the umbrella of the archival storage JIRAs HDFS might be getting close to supporting policy based device aware block placement over heterogenous block storage. The umbrella is HDFS-6584, with a related prerequisite as HDFS-5682. HDFS-6670 adds block storage policies. HDFS-6835 adds an API for setting block storage policies for files. HDFS-6847 will add support for setting block storage policies for directories. I think pretty much when these pieces are in place we can change storage policies on CF directories to move HFile data around heterogenous storage tiers.

> Tiered HFile storage
> --------------------
>
>                 Key: HBASE-6572
>                 URL: https://issues.apache.org/jira/browse/HBASE-6572
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Purtell
>
> Consider how we might enable tiered HFile storage. If HDFS has the capability, we could create certain files on solid state devices where they might be frequently accessed, especially for random reads; and others (and by default) on spinning media as before. We could support the move of frequently read HFiles from spinning media to solid state. We already have CF statistics for this, would only need to add requisite admin interface; could even consider an autotiering option. 
> Dhruba Borthakur did some early work in this area and wrote up his findings: http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html . It is important to note the findings but I suggest most of the recommendations are out of scope of this JIRA. This JIRA seeks to find an initial use case that produces a reasonable benefit, and serves as a testbed for further improvements. If I may paraphrase Dhruba's findings (any misstatements and errors are mine): First, the DFSClient code paths introduce significant latency, so the HDFS client (and presumably the DataNode, as the next bottleneck) will need significant work to knock that down. Need to investigate optimized (perhaps read-only) DFS clients, server side read and caching strategies. Second, RegionServers are heavily threaded and this imposes a lot of monitor contention and context switching cost. Need to investigate reducing the number of threads in a RegionServer, nonblocking IO and RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)