You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@ip-10-146-233-104.ec2.internal> on 2016/02/02 01:03:00 UTC

[kudu-CR] Enable compression and smaller block size for composite key index

Todd Lipcon has uploaded a new patch set (#2).

Change subject: Enable compression and smaller block size for composite key index
......................................................................

Enable compression and smaller block size for composite key index

After running the time series workload on d2106 for a couple months,
I found a couple interesting things:

- The composite key index (aka "ad hoc index") was taking 6.4 bytes
  per row (vs 5.23 *bits* for the actual data). Compressing it with
  'lzop' on that dataset gained a 6.2x savings.

Thus, this patch changes this index to be compressed using LZ4 by default,
which should save space. On the tpch lineitem table, it saved about 15%.
The performance cost should be fairly minimal -- we always random-access
the index blocks, and in the case of a cache miss, the cost of decompression
is tiny compared to the cost of the resulting disk seek.

- Once we reached ~12B rows, the system degenerated into a seeky mess.
  Looking at tracing revealed that we spent a lot of time reading
  composite indexes, indicating they weren't fitting well in the cache.

I theorize that making these index blocks smaller should decrease the
amount of excess data that gets pulled into the cache when we read them.
Given that these blocks are always random-accessed and never scanned,
using small block sizes makes intuitive sense.

Eventually, both of these options should be table properties, but it
was easier to just set better defaults for now as a quick improvement.

Change-Id: I2b7bfc7a4961c764d262524292ec56e3969af728
---
M src/kudu/tablet/diskrowset.cc
1 file changed, 6 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/953/2
-- 
To view, visit http://gerrit.cloudera.org:8080/953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b7bfc7a4961c764d262524292ec56e3969af728
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>