You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2019/02/20 19:37:00 UTC

[jira] [Created] (KUDU-2704) Rowsets that are much bigger than the target size discourage compactions

Will Berkeley created KUDU-2704:
-----------------------------------

             Summary: Rowsets that are much bigger than the target size discourage compactions
                 Key: KUDU-2704
                 URL: https://issues.apache.org/jira/browse/KUDU-2704
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Will Berkeley
            Assignee: Will Berkeley


In KUDU-2701, I fixed a KUDU-1400-related compaction loop where the size used for compaction was the base data and redos, which caused situations where compacting rowsets that looked small but weren't was effectively a no-op, resulting in a compaction loop. Now, rowset count / KUDU-1400 compactions use the whole rowset size. While testing something on a table with 279 columns, I noticed that almost all rowsets were being flushed at a size of 80-90MB and, even though the tablet height was increasing rapidly and above 20, almost no compactions were happening. Looking into it, when the total size of the rowset is far above the target size, we assign a big negative score to including the rowset in a compaction, since the score is proportional to 1 - size/target size. This problem always existed, it just got worse because the size now includes more things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)