You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Reid Chan (Jira)" <ji...@apache.org> on 2021/12/01 10:17:00 UTC
[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

    [ https://issues.apache.org/jira/browse/HBASE-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451691#comment-17451691 ] 

Reid Chan commented on HBASE-26519:
-----------------------------------

Sounds more like a discussion topic. Would you mind posting it to dev@hbase email.

> StoreFileScanner parallel seek -- productionize or drop?
> --------------------------------------------------------
>
>                 Key: HBASE-26519
>                 URL: https://issues.apache.org/jira/browse/HBASE-26519
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Minor
>
> hbase.storescanner.parallel.seek.enable was added a few years ago in https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to disabled. The description of that says "Enables StoreFileScanner parallel-seeking in StoreScanner, a feature which can reduce response latency under special conditions".
> It's not very clear what "special conditions" means. Reading through the entire comment history on that issue seems to indicate it can help when you have "high random read, low cache hit rate, many store files". 
> We have a bunch of clusters with this shape, and in fact we use SSDs for all storage so I figured this might help a lot. I tried setting this to true on one RegionServer of one of our highest QPS clusters hoping I'd see some clear improvement. This very simple test was pretty much a wash, so I need to do more methodical testing.
> In the test one thing became clear though – is the default thread pool size of 10 good enough for my use-case? I have no way of knowing, as there is no logging or metrics that I can find around thread pool saturation. What I ended up doing was spamming refresh of the /dump endpoint of the RS, and noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK executor. This indicates maybe I should scale the thread pool, but use-cases change over time so this seems like not a great way to determine that.
> Task queuing seems not great for a feature which is aimed at reducing latencies. I wonder if we should consider some changes to make this more easy to deploy in production. Here are some ideas I had:
>  * Can we generate a better default value for the thread pool size, maybe based on number of RS handler threads or some other heuristic?
>  * Should we consider eliminating queuing for this feature? Instead, if the threadpool is saturated run the seek in-line in the current thread (i.e. revert to normal). This would be more similar to how hedged reads work in HDFS.
>  * Can we expose a metric or logging to help operators know when to scale up the thread pool? If we implemented the 2nd option above we could expose "seeksInCurrentThread" counter to track this, again similar to how hedged reads report on saturation.
> But with all of this said, I wonder if anyone is running this in production and has any updated guidance on when to use this? Does it still make sense given the last 8 years of development in HBase? Would it ever make sense to make it enabled by default?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)