You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/08/28 06:24:21 UTC
[jira] [Commented] (KUDU-1582) maintenance manager scheduling very
slow on TS with lots of data
[ https://issues.apache.org/jira/browse/KUDU-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442885#comment-15442885 ]
Todd Lipcon commented on KUDU-1582:
-----------------------------------
Did a little analysis, and it seems like all the time's in the knapsack solver algorithm.
I also grabbed the rowset layout from one of these big tablets, and did a bit of analysis. It looks like we can optimize this significantly (8-10x at least) by computing a lower bound solution (which may not use the entirety of the 'knapsack budget') and comparing that to a computed upper-bound. If the lower-bound solution (which is very fast to compute) is within some percentage of the upper-bound solution, we can skip doing the more expensive knapsack solution.
> maintenance manager scheduling very slow on TS with lots of data
> ----------------------------------------------------------------
>
> Key: KUDU-1582
> URL: https://issues.apache.org/jira/browse/KUDU-1582
> Project: Kudu
> Issue Type: Bug
> Components: perf, tserver
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Attachments: trace.json.gz
>
>
> On a server with ~5.5TB of data, the maintenance manager scheduler thread has gotten quite slow. The thread takes many tens of seconds to pick a maintenance operation, and then the actual operations take only a few seconds to run. So, the actual "duty cycle" of those threads is quite low, and compaction/flushing falls behind.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)