You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Andrew Wong (Jira)" <ji...@apache.org> on 2020/06/04 07:06:00 UTC

[jira] [Commented] (KUDU-3110) tserver data folder too large

    [ https://issues.apache.org/jira/browse/KUDU-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125606#comment-17125606 ] 

Andrew Wong commented on KUDU-3110:
-----------------------------------

[~SeaAndHill] sorry we didn't get to the bottom of this earlier. There isn't much evidence, but based on the points that 1) it was reported with Kudu 1.7, 2) data was inserted somewhat slowly (3GB per day is fairly low), and 3) recreating the table reduced the size, one issue that comes to mind is KUDU-1400, which was fixed in Kudu 1.9.

That issue happens if you insert slowly and in sorted order based on primary key -- Kudu would create many small, non-overlapping DRSs, and it would not compact them together. The result was poor scan performance and potentially a high amount of disk space used because the columnar data would not encode as well in such small chunks. Does that sound like what you saw?

> tserver data folder too large
> -----------------------------
>
>                 Key: KUDU-3110
>                 URL: https://issues.apache.org/jira/browse/KUDU-3110
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.7.1
>            Reporter: SeaAndHill
>            Priority: Critical
>         Attachments: kudu use disk.png
>
>
> there is about 100,000 rows in one table , the kudu tserver data directory use 50G 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)