You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Enis Söztutar <en...@hortonworks.com> on 2012/05/23 21:51:14 UTC
hbase hackaton notes
Hi devs,
We are having a nice hackaton today at Cloudera's offices down at Palo
Alto. There are 30+ ppl who showed up, including most of the committers. In
the morning, there were some discussions related to recent issues. Here are
my notes:
JD - hypertable performance comparison
- their tuning is wrong
- JD tested both same hypertable numbers, hbase tests finished, hbase
slow,
- first do a lot splits, then slow the splits.
- compactions are smarter for hypertable
- smaller memstore is faster, as it fills up, it gets slower
- client does not wait for flush commits, does that async. JD used async
client for getting comparable numbers
Matt - hotpads
- talked about prefix compression, trie data encoding (HBASE-4676)
- went over the chart in the jira ticket
- random reads, bigger block sizes
- does not work very well for md5 prefixed keys, you should partition
using a single byte
- write speed is affected (order of magnitude slower for write compared to
None encoding), see attached pdf in the ticket
- a lot of improvement options for the key-value heap/block cache/encoding
internal APIs
Todd - performance
- demo of oprofile, ycsb test
- uses hw counters, shows actual CPU clocks, L1, L2 cache hits/misses,
etc. Use a custom jvm agent for profiling java
- crc32 from hadoop libzip, URI, KeyValue comparator, etc
Jimmy- pb
- remanining things: coprocessors, rpc engine, meta table, some minor
things
- we should not expose too much rpc internals into coprocessors, and make
it not too difficult
- continue discussion on jira
Jesse - mvn modules
- cross module dependencies should be eliminated
- hbase-server, hbase-client, hbase-shared at lower level, we should think
about mini-cluster
Lars, durable sync
- hflush / hsync
- hacky flush blocks on close mode
- disk io is bursty as it is, we should smooth it out
- maybe do it per column family configurable
David - testing
- rc testing
- aggregate tests results in a wiki or smt for each rc
- binary/ source release issues
- need to recompile hbase with hadoop 1,2. jenkins build for each.
- 0.96, hadoop-1 and hadoop-2
- compatibility tessts, we do not have any, we can add it to checklist
Andrew - async hbase
- build sync client on top of async
Jesse - snaphots
go around the room for integrations
Huddle groups for topics above
Keep hacking,
Enis