You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2015/04/04 20:55:33 UTC

[jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris

    [ https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395882#comment-14395882 ] 

Eric Yang commented on CHUKWA-667:
----------------------------------

Revising the schema one more time.  Having time in timestamp only is difficult to work with high level languages such as Hive or Pig because their load and store function doesn't work with HBase time stamp value.

Table: chukwa
Row Key: [day:primary_key]
Column Family: t
Column: [timestamp]
Value: [value]
Column Family: a
Column: [timestamp]
Value: [tags]
Timestamp: [timestamp]

This schema allows to add annotation and tags to timestamp, and user can choose to query or not query the tags.  We still update timestamp field for ttl to work.

The benefit of having day in front of row key is easier to split the regions, this provides better uniformed data distribution among regions.  Use a separate column family for annotation provide ability to annotate only a specific timestamp.

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: Sub-task
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range (like 30 days) will fetch all the data and draw graph, which will largely lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web frontend queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)