You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/01/17 20:45:26 UTC

[jira] [Commented] (KUDU-1835) Support compression of the WAL

    [ https://issues.apache.org/jira/browse/KUDU-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826771#comment-15826771 ] 

Todd Lipcon commented on KUDU-1835:
-----------------------------------

As a data point, here's a little script I ran on a WAL dir from an internal production workload showing that LZOP gets 9-10x compression and gzip gets 14-15x compression:

{code}
# for x in /data/1/kudu/tablet/wal/wals/c9d36f087779437a812036db75d7e006/wal* ; do raw_size=$(stat -c '%s' $x) ; gzip_size=$(cat $x | gzip -c | wc -c) ; lzop_size=$(cat $x | lzop -c | wc -c) ; echo $raw_size "$lzop_size ($[$raw_size/$lzop_size]x) $gzip_size ($[$raw_size/$gzip_size]x)" ; done
67914806 8822979 (7x) 5801108 (11x)
69050539 8587242 (8x) 5786937 (11x)
67752983 6745962 (10x) 4591334 (14x)
68524538 6452417 (10x) 4316684 (15x)
69306281 6805018 (10x) 4548035 (15x)
67832665 7254455 (9x) 4826115 (14x)
67112269 7164280 (9x) 4765893 (14x)
67334182 7105344 (9x) 4802748 (14x)
67744136 6938754 (9x) 4799502 (14x)
67980985 7152674 (9x) 4740059 (14x)
68014865 7076908 (9x) 4699722 (14x)
69000245 7183600 (9x) 4772002 (14x)
{code}

> Support compression of the WAL
> ------------------------------
>
>                 Key: KUDU-1835
>                 URL: https://issues.apache.org/jira/browse/KUDU-1835
>             Project: Kudu
>          Issue Type: Improvement
>          Components: log, perf
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In some workloads, particularly those which get good compression rates of the underlying data (eg via dictionary coding), the WAL becomes a big bottleneck for write performance. In addition, the large size of WALs can often mean that old WALs get GCed rapidly and cause lagging replicas to get evicted after only a temporary bout of slowness. Making WALs smaller would mean that we can retain more history without the cost of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)