You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2019/02/11 18:02:00 UTC

[jira] [Updated] (KUDU-2693) Buffer DiskRowSet flushes to more efficiently write many columns

     [ https://issues.apache.org/jira/browse/KUDU-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Berkeley updated KUDU-2693:
--------------------------------
    Code Review: https://gerrit.cloudera.org/#/c/12425/

> Buffer DiskRowSet flushes to more efficiently write many columns
> ----------------------------------------------------------------
>
>                 Key: KUDU-2693
>                 URL: https://issues.apache.org/jira/browse/KUDU-2693
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs, tablet
>    Affects Versions: 1.9.0
>            Reporter: Mike Percy
>            Assignee: Todd Lipcon
>            Priority: Major
>
> When looking at a trace of some MRS flushes on a table with 280 columns, it was observed that during the course of the flush some 695 fdatasync() calls occurred.
> One possible way to minimize the number of fsync calls would be to flush directly to memory buffers first, determine the ideal layout on disk for the flushed blocks (possibly striped across one log block container per data disk) and then potentially write the data out to the containers in parallel. This would require some memory buffer space to be reserved per maintenance manager thread, possibly 64MB since the DRS roll size is 32MB.
> According to Todd we could probably do it all in LogBlockManager by adding a new flag to CreateBlockOptions that says whether to buffer or something like that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)