You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by DKN <de...@in.ibm.com> on 2011/06/13 10:57:41 UTC

How to use TsProcessor ?

I am just wondering if TsProcessor is a generic de-mux processor ? Is there a
documentation for what is TsProcessor and if there is a customizable way of
using it for generic data types defined ? I wanted to extend a demux
processor for log processing and came to know that TsProcessor is default in
this wiki link : http://wiki.apache.org/hadoop/DemuxModification

I also couldn't see this table definition for HBase (in the hbase.schema).
Where can I find the schema that I can create the table definitions in HBase
..

Thanks in advance. 

Cheers, DKN

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.

Re: How to use TsProcessor ?

Posted by Eric Yang <ey...@yahoo-inc.com>.
A demux parser is mapped to HBase in this form:

ClassName/DataType = TableName
ReducerType = Column Family.

Row Key is compose of:

Time Partition/Primary Key/Timestamp

Example:

SystemMetrics is emitted by SystemMetrics adaptor.  In org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SystemMetrics demux parser, table name is mapped to "SystemMetrics".  When building a record for "network", the data is extracted from chukwa chunk, and set the reduceType to: "network".  The data has a field called "RxBytes", and it will get written to Hbase table: SystemMetrics, column family: network, and column name: "RxBytes".

This is a hack to ensure demux parser is backward compatible with mapreduce on sequence file.  We might want to redesign this at some point in the future.

Regards,
Eric


On 6/13/11 12:11 PM, "Bill Graham" <bi...@gmail.com> wrote:

Glad to hear I'm not the only one slow on Monday mornings...

Linking the processor name to the table name as a default makes sense, but we should build in a way to override that in a config. Same thing for the column family naming.

Also, how does the datatype get represented in HBase? I'm trying to understand how multiple datatypes would be stored and queried in the same system.



On Mon, Jun 13, 2011 at 9:53 AM, Eric Yang <ey...@yahoo-inc.com> wrote:
Hi Bill,

Must be early Monday morning.  I was reading the javadoc,but misread it.  Please discard my proposal.
The Schema should be:

TableName: TsProcessor
ColumnFamily: log

Regards,
Eric


On 6/13/11 9:47 AM, "Bill Graham" <billgraham@gmail.com <ht...@gmail.com> > wrote:

> TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

Eric, you lost me here. Why does the regular expression map to a column family?


On Mon, Jun 13, 2011 at 9:38 AM, Eric Yang <eyang@yahoo-inc.com <ht...@yahoo-inc.com> > wrote:
Hbase schema is defined by annotation in demux  parsers.  For TsProcessor, it is a generic parser, it is not targeting a specific data type.  HBaseWriter is currently not handling this generic parser well.
The current implementation is writing data processed by TsProcessor to:

TableName: TsProcessor
ColumnFamily: log

It may be possible to add handling of TsProcessor with this specification:

Chunk DataType maps to TableName
TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

If this is reasonable implementation, please file a jira.  Thanks

Regards,
Eric


On 6/13/11 8:56 AM, "Bill Graham" <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:

Apologies, but the documentation around processor configs is somewhat out dated (CHUKWA-538).

For one, TsProcessor is not the default. DefaultProcessor is. You can change this with the chukwa.demux.mapper.default.processor setting.

https://issues.apache.org/jira/browse/CHUKWA-473

Also, ithe docs don't include some enhancements to TsProcessor to make it more multi-purpose. It can handle any data type passed to it as long as it can parse out the date. See this JIRA for how to override the default or the per-dataType date parsing logic:

https://issues.apache.org/jira/browse/CHUKWA-472


Eric is best to field the HBase schema question.


On Mon, Jun 13, 2011 at 1:57 AM, DKN <devaprasad@in.ibm.com <ht...@in.ibm.com>  <ht...@in.ibm.com> > wrote:
I am just wondering if TsProcessor is a generic de-mux processor ? Is there a
documentation for what is TsProcessor and if there is a customizable way of
using it for generic data types defined ? I wanted to extend a demux
processor for log processing and came to know that TsProcessor is default in
this wiki link : http://wiki.apache.org/hadoop/DemuxModification

I also couldn't see this table definition for HBase (in the hbase.schema).
Where can I find the schema that I can create the table definitions in HBase
..

Thanks in advance.

Cheers, DKN

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.







Re: How to use TsProcessor ?

Posted by Bill Graham <bi...@gmail.com>.
Glad to hear I'm not the only one slow on Monday mornings...

Linking the processor name to the table name as a default makes sense, but
we should build in a way to override that in a config. Same thing for the
column family naming.

Also, how does the datatype get represented in HBase? I'm trying to
understand how multiple datatypes would be stored and queried in the same
system.



On Mon, Jun 13, 2011 at 9:53 AM, Eric Yang <ey...@yahoo-inc.com> wrote:

>  Hi Bill,
>
> Must be early Monday morning.  I was reading the javadoc,but misread it.
>  Please discard my proposal.
> The Schema should be:
>
> TableName: TsProcessor
> ColumnFamily: log
>
> Regards,
> Eric
>
>
> On 6/13/11 9:47 AM, "Bill Graham" <bi...@gmail.com> wrote:
>
> > TsProcessor.time.regex.[some_data_type] maps to ColumnFamily
>
> Eric, you lost me here. Why does the regular expression map to a column
> family?
>
>
> On Mon, Jun 13, 2011 at 9:38 AM, Eric Yang <ey...@yahoo-inc.com> wrote:
>
> Hbase schema is defined by annotation in demux  parsers.  For TsProcessor,
> it is a generic parser, it is not targeting a specific data type.
>  HBaseWriter is currently not handling this generic parser well.
> The current implementation is writing data processed by TsProcessor to:
>
> TableName: TsProcessor
> ColumnFamily: log
>
> It may be possible to add handling of TsProcessor with this specification:
>
> Chunk DataType maps to TableName
> TsProcessor.time.regex.[some_data_type] maps to ColumnFamily
>
> If this is reasonable implementation, please file a jira.  Thanks
>
> Regards,
> Eric
>
>
> On 6/13/11 8:56 AM, "Bill Graham" <billgraham@gmail.com <
> http://billgraham@gmail.com> > wrote:
>
> Apologies, but the documentation around processor configs is somewhat out
> dated (CHUKWA-538).
>
> For one, TsProcessor is not the default. DefaultProcessor is. You can
> change this with the chukwa.demux.mapper.default.processor setting.
>
> https://issues.apache.org/jira/browse/CHUKWA-473
>
> Also, ithe docs don't include some enhancements to TsProcessor to make it
> more multi-purpose. It can handle any data type passed to it as long as it
> can parse out the date. See this JIRA for how to override the default or the
> per-dataType date parsing logic:
>
> https://issues.apache.org/jira/browse/CHUKWA-472
>
>
> Eric is best to field the HBase schema question.
>
>
> On Mon, Jun 13, 2011 at 1:57 AM, DKN <devaprasad@in.ibm.com <
> http://devaprasad@in.ibm.com> > wrote:
>
> I am just wondering if TsProcessor is a generic de-mux processor ? Is there
> a
> documentation for what is TsProcessor and if there is a customizable way of
> using it for generic data types defined ? I wanted to extend a demux
> processor for log processing and came to know that TsProcessor is default
> in
> this wiki link : http://wiki.apache.org/hadoop/DemuxModification
>
> I also couldn't see this table definition for HBase (in the hbase.schema).
> Where can I find the schema that I can create the table definitions in
> HBase
> ..
>
> Thanks in advance.
>
> Cheers, DKN
>
> --
> View this message in context:
> http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
> Sent from the Chukwa - Users mailing list archive at Nabble.com.
>
>
>
>
>
>

Re: How to use TsProcessor ?

Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi Bill,

Must be early Monday morning.  I was reading the javadoc,but misread it.  Please discard my proposal.
The Schema should be:

TableName: TsProcessor
ColumnFamily: log

Regards,
Eric

On 6/13/11 9:47 AM, "Bill Graham" <bi...@gmail.com> wrote:

> TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

Eric, you lost me here. Why does the regular expression map to a column family?


On Mon, Jun 13, 2011 at 9:38 AM, Eric Yang <ey...@yahoo-inc.com> wrote:
Hbase schema is defined by annotation in demux  parsers.  For TsProcessor, it is a generic parser, it is not targeting a specific data type.  HBaseWriter is currently not handling this generic parser well.
The current implementation is writing data processed by TsProcessor to:

TableName: TsProcessor
ColumnFamily: log

It may be possible to add handling of TsProcessor with this specification:

Chunk DataType maps to TableName
TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

If this is reasonable implementation, please file a jira.  Thanks

Regards,
Eric


On 6/13/11 8:56 AM, "Bill Graham" <billgraham@gmail.com <ht...@gmail.com> > wrote:

Apologies, but the documentation around processor configs is somewhat out dated (CHUKWA-538).

For one, TsProcessor is not the default. DefaultProcessor is. You can change this with the chukwa.demux.mapper.default.processor setting.

https://issues.apache.org/jira/browse/CHUKWA-473

Also, ithe docs don't include some enhancements to TsProcessor to make it more multi-purpose. It can handle any data type passed to it as long as it can parse out the date. See this JIRA for how to override the default or the per-dataType date parsing logic:

https://issues.apache.org/jira/browse/CHUKWA-472


Eric is best to field the HBase schema question.


On Mon, Jun 13, 2011 at 1:57 AM, DKN <devaprasad@in.ibm.com <ht...@in.ibm.com> > wrote:
I am just wondering if TsProcessor is a generic de-mux processor ? Is there a
documentation for what is TsProcessor and if there is a customizable way of
using it for generic data types defined ? I wanted to extend a demux
processor for log processing and came to know that TsProcessor is default in
this wiki link : http://wiki.apache.org/hadoop/DemuxModification

I also couldn't see this table definition for HBase (in the hbase.schema).
Where can I find the schema that I can create the table definitions in HBase
..

Thanks in advance.

Cheers, DKN

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.





Re: How to use TsProcessor ?

Posted by Bill Graham <bi...@gmail.com>.
> TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

Eric, you lost me here. Why does the regular expression map to a column
family?


On Mon, Jun 13, 2011 at 9:38 AM, Eric Yang <ey...@yahoo-inc.com> wrote:

>  Hbase schema is defined by annotation in demux  parsers.  For
> TsProcessor, it is a generic parser, it is not targeting a specific data
> type.  HBaseWriter is currently not handling this generic parser well.
> The current implementation is writing data processed by TsProcessor to:
>
> TableName: TsProcessor
> ColumnFamily: log
>
> It may be possible to add handling of TsProcessor with this specification:
>
> Chunk DataType maps to TableName
> TsProcessor.time.regex.[some_data_type] maps to ColumnFamily
>
> If this is reasonable implementation, please file a jira.  Thanks
>
> Regards,
> Eric
>
>
> On 6/13/11 8:56 AM, "Bill Graham" <bi...@gmail.com> wrote:
>
> Apologies, but the documentation around processor configs is somewhat out
> dated (CHUKWA-538).
>
> For one, TsProcessor is not the default. DefaultProcessor is. You can
> change this with the chukwa.demux.mapper.default.processor setting.
>
> https://issues.apache.org/jira/browse/CHUKWA-473
>
> Also, ithe docs don't include some enhancements to TsProcessor to make it
> more multi-purpose. It can handle any data type passed to it as long as it
> can parse out the date. See this JIRA for how to override the default or the
> per-dataType date parsing logic:
>
> https://issues.apache.org/jira/browse/CHUKWA-472
>
>
> Eric is best to field the HBase schema question.
>
>
> On Mon, Jun 13, 2011 at 1:57 AM, DKN <de...@in.ibm.com> wrote:
>
> I am just wondering if TsProcessor is a generic de-mux processor ? Is there
> a
> documentation for what is TsProcessor and if there is a customizable way of
> using it for generic data types defined ? I wanted to extend a demux
> processor for log processing and came to know that TsProcessor is default
> in
> this wiki link : http://wiki.apache.org/hadoop/DemuxModification
>
> I also couldn't see this table definition for HBase (in the hbase.schema).
> Where can I find the schema that I can create the table definitions in
> HBase
> ..
>
> Thanks in advance.
>
> Cheers, DKN
>
> --
> View this message in context:
> http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
> Sent from the Chukwa - Users mailing list archive at Nabble.com.
>
>
>
>

Re: How to use TsProcessor ?

Posted by Eric Yang <ey...@yahoo-inc.com>.
Hbase schema is defined by annotation in demux  parsers.  For TsProcessor, it is a generic parser, it is not targeting a specific data type.  HBaseWriter is currently not handling this generic parser well.
The current implementation is writing data processed by TsProcessor to:

TableName: TsProcessor
ColumnFamily: log

It may be possible to add handling of TsProcessor with this specification:

Chunk DataType maps to TableName
TsProcessor.time.regex.[some_data_type] maps to ColumnFamily

If this is reasonable implementation, please file a jira.  Thanks

Regards,
Eric

On 6/13/11 8:56 AM, "Bill Graham" <bi...@gmail.com> wrote:

Apologies, but the documentation around processor configs is somewhat out dated (CHUKWA-538).

For one, TsProcessor is not the default. DefaultProcessor is. You can change this with the chukwa.demux.mapper.default.processor setting.

https://issues.apache.org/jira/browse/CHUKWA-473

Also, ithe docs don't include some enhancements to TsProcessor to make it more multi-purpose. It can handle any data type passed to it as long as it can parse out the date. See this JIRA for how to override the default or the per-dataType date parsing logic:

https://issues.apache.org/jira/browse/CHUKWA-472


Eric is best to field the HBase schema question.


On Mon, Jun 13, 2011 at 1:57 AM, DKN <de...@in.ibm.com> wrote:
I am just wondering if TsProcessor is a generic de-mux processor ? Is there a
documentation for what is TsProcessor and if there is a customizable way of
using it for generic data types defined ? I wanted to extend a demux
processor for log processing and came to know that TsProcessor is default in
this wiki link : http://wiki.apache.org/hadoop/DemuxModification

I also couldn't see this table definition for HBase (in the hbase.schema).
Where can I find the schema that I can create the table definitions in HBase
..

Thanks in advance.

Cheers, DKN

--
View this message in context: http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.



Re: How to use TsProcessor ?

Posted by Bill Graham <bi...@gmail.com>.
Apologies, but the documentation around processor configs is somewhat out
dated (CHUKWA-538).

For one, TsProcessor is not the default. DefaultProcessor is. You can change
this with the chukwa.demux.mapper.default.processor setting.

https://issues.apache.org/jira/browse/CHUKWA-473

Also, ithe docs don't include some enhancements to TsProcessor to make it
more multi-purpose. It can handle any data type passed to it as long as it
can parse out the date. See this JIRA for how to override the default or the
per-dataType date parsing logic:

https://issues.apache.org/jira/browse/CHUKWA-472


Eric is best to field the HBase schema question.


On Mon, Jun 13, 2011 at 1:57 AM, DKN <de...@in.ibm.com> wrote:

> I am just wondering if TsProcessor is a generic de-mux processor ? Is there
> a
> documentation for what is TsProcessor and if there is a customizable way of
> using it for generic data types defined ? I wanted to extend a demux
> processor for log processing and came to know that TsProcessor is default
> in
> this wiki link : http://wiki.apache.org/hadoop/DemuxModification
>
> I also couldn't see this table definition for HBase (in the hbase.schema).
> Where can I find the schema that I can create the table definitions in
> HBase
> ..
>
> Thanks in advance.
>
> Cheers, DKN
>
> --
> View this message in context:
> http://apache-chukwa.679492.n3.nabble.com/How-to-use-TsProcessor-tp3058006p3058006.html
> Sent from the Chukwa - Users mailing list archive at Nabble.com.
>