You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2013/11/02 22:09:19 UTC
[jira] [Updated] (CHUKWA-700) Revisit Chukwa metrics schema design
for HBase
[ https://issues.apache.org/jira/browse/CHUKWA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Yang updated CHUKWA-700:
-----------------------------
Description:
Current Chukwa HBase schema looks like this:
{code}
<timestamp>-<primaryKey> <columnFamily>:<cell>...
{code}
Monotonic increasing timestamp can not evenly distribute across region servers without special handle and care periodically.
It is time to revise the schema, and proposed schema looks like this:
{code}
<hhddmmyyyy>-<primaryId> cf:<cell>...
{code}
Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics is stored on the same row. PrimaryKey is replaced with hash id of the primary key. Metrics tables to aggregate metrics:
chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly
was:
Current Chukwa HBase schema looks like this:
{code}
<columnFamily>
<timestamp>-<primaryKey> <cell>...
{code}
Monotonic increasing timestamp can not evenly distribute across region servers without special handle and care periodically.
It is time to revise the schema, and proposed schema looks like this:
{code}
<cf>
<hhddmmyyyy>-<primaryId> <cell>...
{code}
Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics is stored on the same row. PrimaryKey is replaced with hash id of the primary key. Metrics tables to aggregate metrics:
chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly
> Revisit Chukwa metrics schema design for HBase
> ----------------------------------------------
>
> Key: CHUKWA-700
> URL: https://issues.apache.org/jira/browse/CHUKWA-700
> Project: Chukwa
> Issue Type: Bug
> Components: Data Collection
> Affects Versions: 0.6.0
> Environment: MacOSX, Java
> Reporter: Eric Yang
>
> Current Chukwa HBase schema looks like this:
> {code}
> <timestamp>-<primaryKey> <columnFamily>:<cell>...
> {code}
> Monotonic increasing timestamp can not evenly distribute across region servers without special handle and care periodically.
> It is time to revise the schema, and proposed schema looks like this:
> {code}
> <hhddmmyyyy>-<primaryId> cf:<cell>...
> {code}
> Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics is stored on the same row. PrimaryKey is replaced with hash id of the primary key. Metrics tables to aggregate metrics:
> chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly
--
This message was sent by Atlassian JIRA
(v6.1#6144)