You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@chukwa.apache.org by Ariel Rabkin <as...@gmail.com> on 2010/12/28 03:12:19 UTC

extending demux

Howdy.

I'm gearing up to make use of the new Demux framework. I have several
site-specific metrics that I want to use Chukwa to collect and graph.

I'm a little vague about how to do this.  I think I see what the HBase
metric creation needs to be. But what do I need to do in the way of
Demux processors?

What input format does HICC expect / what's the output format supposed
to be?  Which are the right examples for me to look at? Is anything
documented yet? Who has done this already?


-Ari

-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: extending demux

Posted by Eric Yang <ey...@yahoo-inc.com>.

Hi DKN,

Chukwa ant build system could use some improvements and features like mvn eclipse:eclipse to generate project file for eclipse.  I filed https://issues.apache.org/jira/browse/CHUKWA-590 to tackle this issue.  For building software on top of Chukwa, it would be nice to add "power by Chukwa" to advertise and attract more developers.  Thanks for the feedbacks.  :)

Regards,
Eric

On 6/14/11 2:50 AM, "DKN" <de...@in.ibm.com> wrote:

Eric, Thanks for summarizing it here. I am able to now create successfully my
parsers and see that up in HICC (using the trunk .. ) I had to find my own
ways of a development methodology though .. using eclipse.

A few questions :

> When is the reducer parser invoked ? What is the purpose of the reducer
> and if there are any best practices around it.
> The HICC UI sets the start time to be (now - an hour) making it a bit NOT
> so real time .. Is there a way to override it so that we start seeing
> something as and when data starts getting populated in HBase .. I think
> this should be possible, given that we have HBase storing the latest and
> greatest information.
> If I want to generate my own reports and use the built in 'flot' library,
> are there examples that I can look up in the trunk ?

Having to extend demux with our custom parsers appear really promising and
opens up a lot of opportunities. We will need to see how best we can make
this framework "customizable" for custom defined data types.

Many thanks ..  DKN

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/extending-demux-tp2154571p3062409.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.

Re: extending demux

Posted by DKN <de...@in.ibm.com>.

Eric, Thanks for summarizing it here. I am able to now create successfully my
parsers and see that up in HICC (using the trunk .. ) I had to find my own
ways of a development methodology though .. using eclipse.

A few questions :

> When is the reducer parser invoked ? What is the purpose of the reducer
> and if there are any best practices around it.
> The HICC UI sets the start time to be (now - an hour) making it a bit NOT
> so real time .. Is there a way to override it so that we start seeing
> something as and when data starts getting populated in HBase .. I think
> this should be possible, given that we have HBase storing the latest and
> greatest information.
> If I want to generate my own reports and use the built in 'flot' library,
> are there examples that I can look up in the trunk ?

Having to extend demux with our custom parsers appear really promising and
opens up a lot of opportunities. We will need to see how best we can make
this framework "customizable" for custom defined data types.

Many thanks ..  DKN 

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/extending-demux-tp2154571p3062409.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.

Re: extending demux

Posted by Eric Yang <er...@gmail.com>.

Hi Ari,

Demux framework has been modified to operate in two modes.  First, map
reduce mode is fully backward compatible with Chukwa 0.4 demux.
Second, Chukwa collector uses HBaseWriter, which implements it's own
OutputCollector and invokes demux parsers.  This makes it easy to
write one parser which work in both modes.

Take a look of org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SystemMetrics.
 All demux parsers extends AbstractProcessor class, and implement
parse function.  The input of parse function is basically Chukwa
chunks in string, output collector and reporter class.

A special function called:

buildGenericRecord(ChukwaRecord record, String body, long timestamp,
String reduceType);

ChukwaRecord is basically a HashMap, and it is grouped by reduceType,
timestamp, and primary key (i.e. csource).  In the HBase mode,
reduceType maps to columnFamily name.  Timestamp + Primary key is
mapped to Row Key in HBase.  The table name is defined by annotation
at beginning of the class.  HBaseWriter's OutputCollector takes the
output spill out by the parse function, and put the records into
HBase.

In Summary, to develop a demux processor:

1. Extend AbstractProcessor
2. Annotate table name
3. Implement parse function
4. Configure chukwa-demux-conf.xml to map data type to the new Parser
5. Create hbase schema
6. Restart collector with the new jar and watch data flow and show up in HICC

regards,
Eric

On Mon, Dec 27, 2010 at 6:12 PM, Ariel Rabkin <as...@gmail.com> wrote:
> Howdy.
>
> I'm gearing up to make use of the new Demux framework. I have several
> site-specific metrics that I want to use Chukwa to collect and graph.
>
> I'm a little vague about how to do this.  I think I see what the HBase
> metric creation needs to be. But what do I need to do in the way of
> Demux processors?
>
> What input format does HICC expect / what's the output format supposed
> to be?  Which are the right examples for me to look at? Is anything
> documented yet? Who has done this already?
>
>
> -Ari
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>