You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Bill Graham <bi...@gmail.com> on 2009/12/22 22:36:08 UTC

Ho to deploying a custom processor to demux

Hi,

I've written my own Processor to handle my log format per this wiki and I've
run into a couple of gotchast:
http://wiki.apache.org/hadoop/DemuxModification

1. The default processor is not the TsProcessor as documented, but the
DefaultProcessor (see line 83 of Demux.java). This causes headaches because
when using DefaultProcessor data always goes under minute "0" in hdfs,
regardless of when in the hour it was created.

2. When implementing a custom parser as shown in the wiki, how do you
register the class so it gets included in the job that's submitted to the
hadoop cluster? The only way I've been able to do this is to put my class in
the package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and
then manually add that class to the chukwa-core-0.3.0.jar that  is on my
data processor, which is a pretty rough hack. Otherwise, I get class not
found exceptions in my mapper.

thanks,
Bill

Re: Ho to deploying a custom processor to demux

Posted by Eric Yang <ey...@yahoo-inc.com>.
I think this was the jira that you were talking about Ari.

https://issues.apache.org/jira/browse/CHUKWA-292

Both mapper and reducer should be able to live in another name space.  The
limitation is in the current code.  CHUKWA-422 is the current plan to update
Demux for hadoop 0.20+ and we can make it more user friendly in it.

Regards,
Eric


On 12/23/09 6:43 AM, "asrabkin@gmail.com" <as...@gmail.com> wrote:

> I would say that limitation is a bug. I remember opening a Jira for it, once.
> Shouldn't be too hard to fix...
> 
> sent from my iPhone; please excuse typos and brevity.
> 
> On Dec 23, 2009, at 8:36 AM, Christian <en...@gmail.com> wrote:
> 
>> Hi Eric,
>> 
>> This isn't an issue for me yet, but I can see it becoming one if we end up
>> using Chukwa (currently investigating it). What's the reason behind requiring
>> the package name to be org.apache.hadoop.chukwa.
>> extraction.demux.processor.mapper?
>> 
>> Regards,
>> Chris
>> 
>> 
>> On Tue, Dec 22, 2009 at 6:46 PM, Eric Yang <eyang@yahoo-inc.com
>> <ma...@yahoo-inc.com> > wrote:
>>> I thought this is the current implementation.  The class file should be in
>>> the same package name, but it is not required to be in the same jar file.
>>> If it is not working, please file a jira.
>>> 
>>> Regards,
>>> Eric
>>> 
>>> 
>>> On 12/22/09 4:40 PM, "Bill Graham" <billgraham@gmail.com
>>> <ma...@gmail.com> > wrote:
>>> 
>>>>> The extensions could be added by adding the class to the class path of the
>>>>> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
>>>>> chukwa-demux-conf.xml, then it should work automatically.
>>>> 
>>>> Just to clarify, are you saying this is how it currently works or how it
>>>> could
>>>> work in the future?
>>>> 
>>>> Currently it doesn't work this way, which is the point of my post. I put a
>>>> jar
>>>> in the lib/ directory of my data processor that contained my processor. I
>>>> mapped it in chukwa-demux.xml  and bounced the data processor. I did a ps
>>>> and
>>>> saw the jar in the DemuxManager classpath, but I still got
>>>> ClassNotFoundExceptions.
>>>> 
>>>> The only way I could get it to work was to do the following:
>>>> - Move my class into
>>>> org.apache.hadoop.chukwa.extraction.demux.processor.mapper
>>>> - Add my compiled class to the chukwa-core jar.
>>>> 
>>>> 
>>>> On Tue, Dec 22, 2009 at 3:11 PM, Eric Yang <eyang@yahoo-inc.com
>>>> <ma...@yahoo-inc.com> > wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 12/22/09 2:40 PM, "Bill Graham" <billgraham@gmail.com
>>>>> <ma...@gmail.com> > wrote:
>>>>> 
>>>>>> Thanks for your quick reply Eric.
>>>>>> 
>>>>>> The TsProcessor does use buildGenericRecord and has been working fine for
>>>>>> me
>>>>>> (at least I thought it was). I've mapped it to my dataType as you
>>>>>> described
>>>>>> without problems. My only point with issue #1 was just that the
>>>>>> documentation
>>>>>> is off and that the DefaultProcessor yields what I think is unexpected
>>>>>> behavior.
>>>>>> 
>>>>> 
>>>>> I will update the documentation to align with the code.  Thank you for
>>>>> finding this.
>>>>> 
>>>>>> Yes, annotations would be useful. Or what about just having an extensions
>>>>>> directory (maybe lib/ext/) or something similar where custom jars could
>>>>>> be
>>>>>> placed that are to be submitted by demux M/R? Do you know where the code
>>>>>> resides that handles adding the chukwa-core jar? I poked around bit but
>>>>>> couldn't find it.
>>>>>> 
>>>>>> Finally, is there a JIRA for this issue that you know of? If not I'll
>>>>>> create
>>>>>> one. This is going to become a pain point for us soon, so if we have a
>>>>>> design
>>>>>> I might be able to contribute a patch.
>>>>> 
>>>>> The extensions could be added by adding the class to the class path of the
>>>>> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
>>>>> chukwa-demux-conf.xml, then it should work automatically.  We probably
>>>>> should have a jira to document this.  Please go ahead and file one.
>>>>> 
>>>>> For your interest, the annotation jira is:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CHUKWA-371
>>>>> <https://issues.apache.org/jira/browse/CHUKWA-371>
>>>>> 
>>>>> Regards,
>>>>> Eric
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Re: Ho to deploying a custom processor to demux

Posted by as...@gmail.com.
I would say that limitation is a bug. I remember opening a Jira for  
it, once. Shouldn't be too hard to fix...

sent from my iPhone; please excuse typos and brevity.

On Dec 23, 2009, at 8:36 AM, Christian <en...@gmail.com> wrote:

> Hi Eric,
>
> This isn't an issue for me yet, but I can see it becoming one if we  
> end up using Chukwa (currently investigating it). What's the reason  
> behind requiring the package name to be org.apache.hadoop.chukwa.
> extraction.demux.processor.mapper?
>
> Regards,
> Chris
>
>
> On Tue, Dec 22, 2009 at 6:46 PM, Eric Yang <ey...@yahoo-inc.com>  
> wrote:
> I thought this is the current implementation.  The class file should  
> be in
> the same package name, but it is not required to be in the same jar  
> file.
> If it is not working, please file a jira.
>
> Regards,
> Eric
>
>
> On 12/22/09 4:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> >> The extensions could be added by adding the class to the class  
> path of the
> >> demux process.  If you put your jar file in CHUKWA_HOME/lib and  
> update
> >> chukwa-demux-conf.xml, then it should work automatically.
> >
> > Just to clarify, are you saying this is how it currently works or  
> how it could
> > work in the future?
> >
> > Currently it doesn't work this way, which is the point of my post.  
> I put a jar
> > in the lib/ directory of my data processor that contained my  
> processor. I
> > mapped it in chukwa-demux.xml  and bounced the data processor. I  
> did a ps and
> > saw the jar in the DemuxManager classpath, but I still got
> > ClassNotFoundExceptions.
> >
> > The only way I could get it to work was to do the following:
> > - Move my class into
> > org.apache.hadoop.chukwa.extraction.demux.processor.mapper
> > - Add my compiled class to the chukwa-core jar.
> >
> >
> > On Tue, Dec 22, 2009 at 3:11 PM, Eric Yang <ey...@yahoo-inc.com>  
> wrote:
> >>
> >>
> >>
> >> On 12/22/09 2:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
> >>
> >>> Thanks for your quick reply Eric.
> >>>
> >>> The TsProcessor does use buildGenericRecord and has been working  
> fine for me
> >>> (at least I thought it was). I've mapped it to my dataType as  
> you described
> >>> without problems. My only point with issue #1 was just that the
> >>> documentation
> >>> is off and that the DefaultProcessor yields what I think is  
> unexpected
> >>> behavior.
> >>>
> >>
> >> I will update the documentation to align with the code.  Thank  
> you for
> >> finding this.
> >>
> >>> Yes, annotations would be useful. Or what about just having an  
> extensions
> >>> directory (maybe lib/ext/) or something similar where custom  
> jars could be
> >>> placed that are to be submitted by demux M/R? Do you know where  
> the code
> >>> resides that handles adding the chukwa-core jar? I poked around  
> bit but
> >>> couldn't find it.
> >>>
> >>> Finally, is there a JIRA for this issue that you know of? If not  
> I'll create
> >>> one. This is going to become a pain point for us soon, so if we  
> have a
> >>> design
> >>> I might be able to contribute a patch.
> >>
> >> The extensions could be added by adding the class to the class  
> path of the
> >> demux process.  If you put your jar file in CHUKWA_HOME/lib and  
> update
> >> chukwa-demux-conf.xml, then it should work automatically.  We  
> probably
> >> should have a jira to document this.  Please go ahead and file one.
> >>
> >> For your interest, the annotation jira is:
> >>
> >> https://issues.apache.org/jira/browse/CHUKWA-371
> >>
> >> Regards,
> >> Eric
> >>
> >>
> >
> >
>
>

Re: Ho to deploying a custom processor to demux

Posted by Christian <en...@gmail.com>.
Hi Eric,

This isn't an issue for me yet, but I can see it becoming one if we end up
using Chukwa (currently investigating it). What's the reason behind
requiring the package name to be org.apache.hadoop.chukwa.
extraction.demux.processor.mapper?

Regards,
Chris


On Tue, Dec 22, 2009 at 6:46 PM, Eric Yang <ey...@yahoo-inc.com> wrote:

> I thought this is the current implementation.  The class file should be in
> the same package name, but it is not required to be in the same jar file.
> If it is not working, please file a jira.
>
> Regards,
> Eric
>
>
> On 12/22/09 4:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> >> The extensions could be added by adding the class to the class path of
> the
> >> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
> >> chukwa-demux-conf.xml, then it should work automatically.
> >
> > Just to clarify, are you saying this is how it currently works or how it
> could
> > work in the future?
> >
> > Currently it doesn't work this way, which is the point of my post. I put
> a jar
> > in the lib/ directory of my data processor that contained my processor. I
> > mapped it in chukwa-demux.xml  and bounced the data processor. I did a ps
> and
> > saw the jar in the DemuxManager classpath, but I still got
> > ClassNotFoundExceptions.
> >
> > The only way I could get it to work was to do the following:
> > - Move my class into
> > org.apache.hadoop.chukwa.extraction.demux.processor.mapper
> > - Add my compiled class to the chukwa-core jar.
> >
> >
> > On Tue, Dec 22, 2009 at 3:11 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
> >>
> >>
> >>
> >> On 12/22/09 2:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
> >>
> >>> Thanks for your quick reply Eric.
> >>>
> >>> The TsProcessor does use buildGenericRecord and has been working fine
> for me
> >>> (at least I thought it was). I've mapped it to my dataType as you
> described
> >>> without problems. My only point with issue #1 was just that the
> >>> documentation
> >>> is off and that the DefaultProcessor yields what I think is unexpected
> >>> behavior.
> >>>
> >>
> >> I will update the documentation to align with the code.  Thank you for
> >> finding this.
> >>
> >>> Yes, annotations would be useful. Or what about just having an
> extensions
> >>> directory (maybe lib/ext/) or something similar where custom jars could
> be
> >>> placed that are to be submitted by demux M/R? Do you know where the
> code
> >>> resides that handles adding the chukwa-core jar? I poked around bit but
> >>> couldn't find it.
> >>>
> >>> Finally, is there a JIRA for this issue that you know of? If not I'll
> create
> >>> one. This is going to become a pain point for us soon, so if we have a
> >>> design
> >>> I might be able to contribute a patch.
> >>
> >> The extensions could be added by adding the class to the class path of
> the
> >> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
> >> chukwa-demux-conf.xml, then it should work automatically.  We probably
> >> should have a jira to document this.  Please go ahead and file one.
> >>
> >> For your interest, the annotation jira is:
> >>
> >> https://issues.apache.org/jira/browse/CHUKWA-371
> >>
> >> Regards,
> >> Eric
> >>
> >>
> >
> >
>
>

Re: Ho to deploying a custom processor to demux

Posted by Eric Yang <ey...@yahoo-inc.com>.
I thought this is the current implementation.  The class file should be in
the same package name, but it is not required to be in the same jar file.
If it is not working, please file a jira.

Regards,
Eric


On 12/22/09 4:40 PM, "Bill Graham" <bi...@gmail.com> wrote:

>> The extensions could be added by adding the class to the class path of the
>> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
>> chukwa-demux-conf.xml, then it should work automatically. 
> 
> Just to clarify, are you saying this is how it currently works or how it could
> work in the future?
> 
> Currently it doesn't work this way, which is the point of my post. I put a jar
> in the lib/ directory of my data processor that contained my processor. I
> mapped it in chukwa-demux.xml  and bounced the data processor. I did a ps and
> saw the jar in the DemuxManager classpath, but I still got
> ClassNotFoundExceptions.
> 
> The only way I could get it to work was to do the following:
> - Move my class into
> org.apache.hadoop.chukwa.extraction.demux.processor.mapper
> - Add my compiled class to the chukwa-core jar.
> 
> 
> On Tue, Dec 22, 2009 at 3:11 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>> 
>> 
>> 
>> On 12/22/09 2:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
>> 
>>> Thanks for your quick reply Eric.
>>> 
>>> The TsProcessor does use buildGenericRecord and has been working fine for me
>>> (at least I thought it was). I've mapped it to my dataType as you described
>>> without problems. My only point with issue #1 was just that the
>>> documentation
>>> is off and that the DefaultProcessor yields what I think is unexpected
>>> behavior.
>>> 
>> 
>> I will update the documentation to align with the code.  Thank you for
>> finding this.
>> 
>>> Yes, annotations would be useful. Or what about just having an extensions
>>> directory (maybe lib/ext/) or something similar where custom jars could be
>>> placed that are to be submitted by demux M/R? Do you know where the code
>>> resides that handles adding the chukwa-core jar? I poked around bit but
>>> couldn't find it.
>>> 
>>> Finally, is there a JIRA for this issue that you know of? If not I'll create
>>> one. This is going to become a pain point for us soon, so if we have a
>>> design
>>> I might be able to contribute a patch.
>> 
>> The extensions could be added by adding the class to the class path of the
>> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
>> chukwa-demux-conf.xml, then it should work automatically.  We probably
>> should have a jira to document this.  Please go ahead and file one.
>> 
>> For your interest, the annotation jira is:
>> 
>> https://issues.apache.org/jira/browse/CHUKWA-371
>> 
>> Regards,
>> Eric
>> 
>> 
> 
> 


Re: Ho to deploying a custom processor to demux

Posted by Bill Graham <bi...@gmail.com>.
> The extensions could be added by adding the class to the class path of the
> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
> chukwa-demux-conf.xml, then it should work automatically.

Just to clarify, are you saying this is how it currently works or how it
could work in the future?

Currently it doesn't work this way, which is the point of my post. I put a
jar in the lib/ directory of my data processor that contained my processor.
I mapped it in chukwa-demux.xml  and bounced the data processor. I did a ps
and saw the jar in the DemuxManager classpath, but I still got
ClassNotFoundExceptions.

The only way I could get it to work was to do the following:
- Move my class into
org.apache.hadoop.chukwa.extraction.demux.processor.mapper
- Add my compiled class to the chukwa-core jar.


On Tue, Dec 22, 2009 at 3:11 PM, Eric Yang <ey...@yahoo-inc.com> wrote:

>
>
>
> On 12/22/09 2:40 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> > Thanks for your quick reply Eric.
> >
> > The TsProcessor does use buildGenericRecord and has been working fine for
> me
> > (at least I thought it was). I've mapped it to my dataType as you
> described
> > without problems. My only point with issue #1 was just that the
> documentation
> > is off and that the DefaultProcessor yields what I think is unexpected
> > behavior.
> >
>
> I will update the documentation to align with the code.  Thank you for
> finding this.
>
> > Yes, annotations would be useful. Or what about just having an extensions
> > directory (maybe lib/ext/) or something similar where custom jars could
> be
> > placed that are to be submitted by demux M/R? Do you know where the code
> > resides that handles adding the chukwa-core jar? I poked around bit but
> > couldn't find it.
> >
> > Finally, is there a JIRA for this issue that you know of? If not I'll
> create
> > one. This is going to become a pain point for us soon, so if we have a
> design
> > I might be able to contribute a patch.
>
> The extensions could be added by adding the class to the class path of the
> demux process.  If you put your jar file in CHUKWA_HOME/lib and update
> chukwa-demux-conf.xml, then it should work automatically.  We probably
> should have a jira to document this.  Please go ahead and file one.
>
> For your interest, the annotation jira is:
>
> https://issues.apache.org/jira/browse/CHUKWA-371
>
> Regards,
> Eric
>
>
>

Re: Ho to deploying a custom processor to demux

Posted by Eric Yang <ey...@yahoo-inc.com>.


On 12/22/09 2:40 PM, "Bill Graham" <bi...@gmail.com> wrote:

> Thanks for your quick reply Eric.
> 
> The TsProcessor does use buildGenericRecord and has been working fine for me
> (at least I thought it was). I've mapped it to my dataType as you described
> without problems. My only point with issue #1 was just that the documentation
> is off and that the DefaultProcessor yields what I think is unexpected
> behavior.
>

I will update the documentation to align with the code.  Thank you for
finding this.
 
> Yes, annotations would be useful. Or what about just having an extensions
> directory (maybe lib/ext/) or something similar where custom jars could be
> placed that are to be submitted by demux M/R? Do you know where the code
> resides that handles adding the chukwa-core jar? I poked around bit but
> couldn't find it.
> 
> Finally, is there a JIRA for this issue that you know of? If not I'll create
> one. This is going to become a pain point for us soon, so if we have a design
> I might be able to contribute a patch.

The extensions could be added by adding the class to the class path of the
demux process.  If you put your jar file in CHUKWA_HOME/lib and update
chukwa-demux-conf.xml, then it should work automatically.  We probably
should have a jira to document this.  Please go ahead and file one.

For your interest, the annotation jira is:

https://issues.apache.org/jira/browse/CHUKWA-371

Regards,
Eric



Re: Ho to deploying a custom processor to demux

Posted by Bill Graham <bi...@gmail.com>.
Thanks for your quick reply Eric.

The TsProcessor does use buildGenericRecord and has been working fine for me
(at least I thought it was). I've mapped it to my dataType as you described
without problems. My only point with issue #1 was just that the
documentation is off and that the DefaultProcessor yields what I think is
unexpected behavior.

> There is an plan to load parser class from class path by using Java
annotation.
> It is still in the initial phase of planning.  Design participation are
welcome.

Yes, annotations would be useful. Or what about just having an extensions
directory (maybe lib/ext/) or something similar where custom jars could be
placed that are to be submitted by demux M/R? Do you know where the code
resides that handles adding the chukwa-core jar? I poked around bit but
couldn't find it.

Finally, is there a JIRA for this issue that you know of? If not I'll create
one. This is going to become a pain point for us soon, so if we have a
design I might be able to contribute a patch.

thanks,
Bill


On Tue, Dec 22, 2009 at 2:14 PM, Eric Yang <ey...@yahoo-inc.com> wrote:

> On 12/22/09 1:36 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> > I've written my own Processor to handle my log format per this wiki and
> I've
> > run into a couple of gotchast:
> > http://wiki.apache.org/hadoop/DemuxModification
> >
> > 1. The default processor is not the TsProcessor as documented, but the
> > DefaultProcessor (see line 83 of Demux.java). This causes headaches
> because
> > when using DefaultProcessor  data always goes under minute "0" in hdfs,
> > regardless of when in the hour it was created.
> >
>
> There is a generic method to build the record, like:
>
> buildGenericRecord(record, recordEntry, timestamp, recordType);
>
> This method will build up key like:
>
> Time partition/Primary Key/timestamp
>
> When all records are roll up into large sequence file by end of the hour
> and
> end of the day, the sequence file is sorted by time partition and primary
> key.  This arrangement of data structure was put in place to assist data
> scanning.  When data is retrieved, use record.getTimestamp() to find the
> real timestamp for the record.
>
> TsProcessor is incompleted for now because the key in ChukwaRecord is used
> in hourly and daily roll up.  Without using buildGenericRecord, hourly and
> daily roll up will not work correctly.
>
> > 2. When implementing a custom parser as shown in the wiki, how do you
> register
> > the class so it gets included in the job that's submitted to the hadoop
> > cluster? The only way I've been able to do this is to put my class in the
> > package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and
> then
> > manually add that class to the chukwa-core-0.3.0.jar that  is on my data
> > processor, which is a pretty rough hack. Otherwise, I get class not found
> > exceptions in my mapper.
>
> The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml,
> and map the recordType to your parser class.  There is an plan to load
> parser class from class path by using Java annotation.  It is still in the
> initial phase of planning.  Design participation are welcome.  Hope this
> helps.  :)
>
> Regards,
> Eric
>
>

Re: Ho to deploying a custom processor to demux

Posted by Eric Yang <ey...@yahoo-inc.com>.
On 12/22/09 1:36 PM, "Bill Graham" <bi...@gmail.com> wrote:

> I've written my own Processor to handle my log format per this wiki and I've
> run into a couple of gotchast:
> http://wiki.apache.org/hadoop/DemuxModification
> 
> 1. The default processor is not the TsProcessor as documented, but the
> DefaultProcessor (see line 83 of Demux.java). This causes headaches because
> when using DefaultProcessor  data always goes under minute "0" in hdfs,
> regardless of when in the hour it was created.
> 

There is a generic method to build the record, like:

buildGenericRecord(record, recordEntry, timestamp, recordType);

This method will build up key like:

Time partition/Primary Key/timestamp

When all records are roll up into large sequence file by end of the hour and
end of the day, the sequence file is sorted by time partition and primary
key.  This arrangement of data structure was put in place to assist data
scanning.  When data is retrieved, use record.getTimestamp() to find the
real timestamp for the record.

TsProcessor is incompleted for now because the key in ChukwaRecord is used
in hourly and daily roll up.  Without using buildGenericRecord, hourly and
daily roll up will not work correctly.

> 2. When implementing a custom parser as shown in the wiki, how do you register
> the class so it gets included in the job that's submitted to the hadoop
> cluster? The only way I've been able to do this is to put my class in the
> package org.apache.hadoop.chukwa.extraction.demux.processor.mapper and then
> manually add that class to the chukwa-core-0.3.0.jar that  is on my data
> processor, which is a pretty rough hack. Otherwise, I get class not found
> exceptions in my mapper.

The demux process is controlled by $CHUKWA_HOME/conf/chukwa-demux-conf.xml,
and map the recordType to your parser class.  There is an plan to load
parser class from class path by using Java annotation.  It is still in the
initial phase of planning.  Design participation are welcome.  Hope this
helps.  :)

Regards,
Eric