You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2009/07/19 05:29:08 UTC

[jira] Commented: (CHUKWA-22) Need index for chukwa sequence files

    [ https://issues.apache.org/jira/browse/CHUKWA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732961#action_12732961 ] 

Eric Yang commented on CHUKWA-22:
---------------------------------

Building index file would not be sufficient to serve chukwa data straight from HDFS for long term operation.  The cost for keeping index in memory will eventually require yet another distributed system to manage the index files.  Instead of reinvent the wheel, chukwa should adopt a big table like solution like hbase to manage the data regions.

mapreduce-to-hbase example (http://wiki.apache.org/hadoop/Hbase/MapReduce) looks like exactly what Chukwa needs.  Hbase table schema for chukwa could look like this:

Table: SystemMetrics-[TimeType]
Column Family: cpu
Column Family: memory
Column Family: disk
Column Family: temperature
Column Family: network
Column Family: default
Column Family: log

Each row represent 1 minute average, 5 minutes average, etc.  This is determined on the time type.

Example of a column could be: idle:hostname1, busy:hostname1, idle:hostname2, busy: hostname2

log column family keeps the raw log entries for log viewing.


> Need index for chukwa sequence files
> ------------------------------------
>
>                 Key: CHUKWA-22
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-22
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>         Environment: Redhat EL 5.1 and Java 6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>
> Chukwa has ability to collect large volume of data, but the lack of index prevents Chukwa front end to serve data straight from HDFS.  This jira is the place holder for designing a indexing service for Chukwa.  The plan is to create indexing service base on available software like lucene or katta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.