You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by Bill Graham <bi...@gmail.com> on 2010/09/20 23:58:05 UTC

Cluster-specific Adaptors

Hi,

In CHUKWA-515 we discussed the possibility being able to add an
adaptor bound to a given cluster:

https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811

I can actually see this being useful, especially now that it's easier
to add/remove agents with the Adaptor REST API. Looking into the code
it doesn't seem like it would be that hard to do, but I want to make
sure I'm not overlooking anything.

It seems like we could support this with a few small changes:

- Add the concept of tags to the Adaptor interface.
- AbstractAdator would support a getTags method which would return the
union of tags set on the Adaptor and the default tags on the
DataFactory.
- Internal tag implementations on each would change to store tags in
maps, instead of concat'ed strings. This would allow for a "last in
wins" type of functionality so tags could be overriden. This assumes
of course that there should never be more than one of the same tag key
value, which I _assume_ is the case.
- The ChunkImpl constructor will call getTags on the agent, instead of
getDefaultTags the data factory.

The trickiest part as I see it is figuring out how to change the add
adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
both makes sense and doesn't break things. In it's current form it
doesn't have room for easy expansion except at the end of the line:

add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
<initial offset>

Any thoughts or suggestions? There's also a potential gotcha with all
the impls of Adaptor.parseArgs either breaking or needing to
change....

thanks,
Bill

Re: Cluster-specific Adaptors

Posted by Bill Graham <bi...@gmail.com>.
Keeping state for a use case like this adds unnecessary complexity in
both the implementation of the client/server and the protocol. Also,
the protocol would be state-full in this case not because doing so
would make it easier to use protocol, but to work around the limits of
the existing protocol, which really could just be changed.

>  It's easy to know when the state can be discarded

I disagree. If I make two HTTP requests the underlying HTTP
client/browser could use one or more connections for example and it
puts an unnecessary burden on the client to track this. You'd instead
need to keep a token to identify each user to bind their state to,
which would need to be passed around, which is a pain. Then how does
he server and the client know when this state is expiring? What if the
server gets bounced? Race conditions? Etc. Yes, there are technical
solutions to all of these things, but they all come at a way more
complex cost to implement and use than I think we need to handle
per-adaptor tags.


On Tue, Sep 21, 2010 at 11:54 AM, Ariel Rabkin <as...@gmail.com> wrote:
> Why is it bad to keep some state per command session?  It's easy to
> know when the state can be discarded -- as soon as we're done reading
> the file or when the socket closes.
>
> I think it's fairly intuitive and readable; it's routine for e.g.,
> scripts to modify interpreter state.  In straight-line code, which is
> all we're ever going to have in the control protocol, this is very
> easy to reason about.
>
> --Ari
>
> On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <bi...@gmail.com> wrote:
>> +1 on staying stateless.
>>
>> I think the challenge we're facing is that we're trying to support a
>> syntax that is simple and readable and can be done with a single line
>> (i.e. for the initial_adaptors file, the telnet API, the command line,
>> etc), but the configs can potentially be not-so-simple.
>>
>> For example, here's how you might configure the JMS adaptor which used
>> dependency injection. That's a lot for a single line and there's
>> nowhere to add new global configs in front of the adaptor specific
>> configs without breaking things.
>>
>> add jms.JMSAdaptor jms-events
>> failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
>> -q some.queue.name -s "id_type IN ('162')" -x
>> org.apache.hadoop.chukwa.datacollection.adaptor.
>> jms.JMSMessagePropertyTransformer -p
>> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
>> 0
>>
>> What if we were to adopt a few flags into the syntax:
>>
>> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
>> [--adaptor-params <adaptor specific params>|--adaptor-config-file
>> <file>]
>> <initial offset>
>>
>> The '--*' flags could be reserved. This would allow us to keep with a
>> one-line syntax where that approach works, but allow for expansion.
>> Also, if an adaptor config got to complex, those configs could be
>> specified in a file if needed.
>>
>>
>> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <jb...@netflix.com> wrote:
>>> Hi,
>>> If I had to implement this, I will add an extra parameter
>>> (?extraParams=xyz).
>>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>>> specific info.
>>> I don’t think that we could/should add new complexity in the parsing.
>>> The same think should be done for getCurrentStatus(), a public result, that
>>> is the same for all adaptors in order to know if the adaptors is working or
>>> not and a private section that will give extra information.
>>>
>>> Also, moving to a json input should simplify everything.
>>> /Jerome.
>>>
>>> On 9/20/10 5:15 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>>
>>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>>> me. Plus, it would put the responsibility of parsing tags on each
>>> adaptor impl and would require a refactor of how each one currently
>>> parses args.
>>>
>>> Actually, we might be able to intercept the call to parseArgs in
>>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>>> to the subclass, which would be none the wiser. Not the cleanest, but
>>> at lease not as intrusive on the adaptor implementations.
>>>
>>> Ari, also what about the getCurrentStatus() method? I'd think all the
>>> impls would somehow need to incorporate tags into that response as
>>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>>> checkpoints file.
>>>
>>>
>>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>>>> Hi Bill,
>>>>
>>>> This might be hacky but it should be possible to have adaptor specific
>>>> params to include tags.  Ari, what do you think?
>>>>
>>>> Regards,
>>>> Eric
>>>>
>>>> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In CHUKWA-515 we discussed the possibility being able to add an
>>>> adaptor bound to a given cluster:
>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>>>
>>>> I can actually see this being useful, especially now that it's easier
>>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>>> it doesn't seem like it would be that hard to do, but I want to make
>>>> sure I'm not overlooking anything.
>>>>
>>>> It seems like we could support this with a few small changes:
>>>>
>>>> - Add the concept of tags to the Adaptor interface.
>>>> - AbstractAdator would support a getTags method which would return the
>>>> union of tags set on the Adaptor and the default tags on the
>>>> DataFactory.
>>>> - Internal tag implementations on each would change to store tags in
>>>> maps, instead of concat'ed strings. This would allow for a "last in
>>>> wins" type of functionality so tags could be overriden. This assumes
>>>> of course that there should never be more than one of the same tag key
>>>> value, which I _assume_ is the case.
>>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>>> getDefaultTags the data factory.
>>>>
>>>> The trickiest part as I see it is figuring out how to change the add
>>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>>> both makes sense and doesn't break things. In it's current form it
>>>> doesn't have room for easy expansion except at the end of the line:
>>>>
>>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>>>> <initial offset>
>>>>
>>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>>> the impls of Adaptor.parseArgs either breaking or needing to
>>>> change....
>>>>
>>>> thanks,
>>>> Bill
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>

Re: Cluster-specific Adaptors

Posted by Ariel Rabkin <as...@gmail.com>.
Why is it bad to keep some state per command session?  It's easy to
know when the state can be discarded -- as soon as we're done reading
the file or when the socket closes.

I think it's fairly intuitive and readable; it's routine for e.g.,
scripts to modify interpreter state.  In straight-line code, which is
all we're ever going to have in the control protocol, this is very
easy to reason about.

--Ari

On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <bi...@gmail.com> wrote:
> +1 on staying stateless.
>
> I think the challenge we're facing is that we're trying to support a
> syntax that is simple and readable and can be done with a single line
> (i.e. for the initial_adaptors file, the telnet API, the command line,
> etc), but the configs can potentially be not-so-simple.
>
> For example, here's how you might configure the JMS adaptor which used
> dependency injection. That's a lot for a single line and there's
> nowhere to add new global configs in front of the adaptor specific
> configs without breaking things.
>
> add jms.JMSAdaptor jms-events
> failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
> -q some.queue.name -s "id_type IN ('162')" -x
> org.apache.hadoop.chukwa.datacollection.adaptor.
> jms.JMSMessagePropertyTransformer -p
> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
> 0
>
> What if we were to adopt a few flags into the syntax:
>
> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
> [--adaptor-params <adaptor specific params>|--adaptor-config-file
> <file>]
> <initial offset>
>
> The '--*' flags could be reserved. This would allow us to keep with a
> one-line syntax where that approach works, but allow for expansion.
> Also, if an adaptor config got to complex, those configs could be
> specified in a file if needed.
>
>
> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <jb...@netflix.com> wrote:
>> Hi,
>> If I had to implement this, I will add an extra parameter
>> (?extraParams=xyz).
>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>> specific info.
>> I don’t think that we could/should add new complexity in the parsing.
>> The same think should be done for getCurrentStatus(), a public result, that
>> is the same for all adaptors in order to know if the adaptors is working or
>> not and a private section that will give extra information.
>>
>> Also, moving to a json input should simplify everything.
>> /Jerome.
>>
>> On 9/20/10 5:15 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>
>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>> me. Plus, it would put the responsibility of parsing tags on each
>> adaptor impl and would require a refactor of how each one currently
>> parses args.
>>
>> Actually, we might be able to intercept the call to parseArgs in
>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>> to the subclass, which would be none the wiser. Not the cleanest, but
>> at lease not as intrusive on the adaptor implementations.
>>
>> Ari, also what about the getCurrentStatus() method? I'd think all the
>> impls would somehow need to incorporate tags into that response as
>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>> checkpoints file.
>>
>>
>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>>> Hi Bill,
>>>
>>> This might be hacky but it should be possible to have adaptor specific
>>> params to include tags.  Ari, what do you think?
>>>
>>> Regards,
>>> Eric
>>>
>>> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> In CHUKWA-515 we discussed the possibility being able to add an
>>> adaptor bound to a given cluster:
>>>
>>>
>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>>
>>> I can actually see this being useful, especially now that it's easier
>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>> it doesn't seem like it would be that hard to do, but I want to make
>>> sure I'm not overlooking anything.
>>>
>>> It seems like we could support this with a few small changes:
>>>
>>> - Add the concept of tags to the Adaptor interface.
>>> - AbstractAdator would support a getTags method which would return the
>>> union of tags set on the Adaptor and the default tags on the
>>> DataFactory.
>>> - Internal tag implementations on each would change to store tags in
>>> maps, instead of concat'ed strings. This would allow for a "last in
>>> wins" type of functionality so tags could be overriden. This assumes
>>> of course that there should never be more than one of the same tag key
>>> value, which I _assume_ is the case.
>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>> getDefaultTags the data factory.
>>>
>>> The trickiest part as I see it is figuring out how to change the add
>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>> both makes sense and doesn't break things. In it's current form it
>>> doesn't have room for easy expansion except at the end of the line:
>>>
>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>>> <initial offset>
>>>
>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>> the impls of Adaptor.parseArgs either breaking or needing to
>>> change....
>>>
>>> thanks,
>>> Bill
>>>
>>>
>>
>>
>>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: Cluster-specific Adaptors

Posted by Bill Graham <bi...@gmail.com>.
+1 on staying stateless.

I think the challenge we're facing is that we're trying to support a
syntax that is simple and readable and can be done with a single line
(i.e. for the initial_adaptors file, the telnet API, the command line,
etc), but the configs can potentially be not-so-simple.

For example, here's how you might configure the JMS adaptor which used
dependency injection. That's a lot for a single line and there's
nowhere to add new global configs in front of the adaptor specific
configs without breaking things.

add jms.JMSAdaptor jms-events
failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
-q some.queue.name -s "id_type IN ('162')" -x
org.apache.hadoop.chukwa.datacollection.adaptor.
jms.JMSMessagePropertyTransformer -p
"event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
0

What if we were to adopt a few flags into the syntax:

add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
[--adaptor-params <adaptor specific params>|--adaptor-config-file
<file>]
<initial offset>

The '--*' flags could be reserved. This would allow us to keep with a
one-line syntax where that approach works, but allow for expansion.
Also, if an adaptor config got to complex, those configs could be
specified in a file if needed.


On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <jb...@netflix.com> wrote:
> Hi,
> If I had to implement this, I will add an extra parameter
> (?extraParams=xyz).
> The adaptorImp will be the only one responsible for parsing this adaptor’s
> specific info.
> I don’t think that we could/should add new complexity in the parsing.
> The same think should be done for getCurrentStatus(), a public result, that
> is the same for all adaptors in order to know if the adaptors is working or
> not and a private section that will give extra information.
>
> Also, moving to a json input should simplify everything.
> /Jerome.
>
> On 9/20/10 5:15 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> I'd like to hear Ari's take on this, but this does feel a bit hacky to
> me. Plus, it would put the responsibility of parsing tags on each
> adaptor impl and would require a refactor of how each one currently
> parses args.
>
> Actually, we might be able to intercept the call to parseArgs in
> AbstractAdaptor and pull out the tags if they exist and pass the rest
> to the subclass, which would be none the wiser. Not the cleanest, but
> at lease not as intrusive on the adaptor implementations.
>
> Ari, also what about the getCurrentStatus() method? I'd think all the
> impls would somehow need to incorporate tags into that response as
> well, since AFAIR that's what's used to do Adaptor SerDe with the
> checkpoints file.
>
>
> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>> Hi Bill,
>>
>> This might be hacky but it should be possible to have adaptor specific
>> params to include tags.  Ari, what do you think?
>>
>> Regards,
>> Eric
>>
>> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>
>> Hi,
>>
>> In CHUKWA-515 we discussed the possibility being able to add an
>> adaptor bound to a given cluster:
>>
>>
>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>
>> I can actually see this being useful, especially now that it's easier
>> to add/remove agents with the Adaptor REST API. Looking into the code
>> it doesn't seem like it would be that hard to do, but I want to make
>> sure I'm not overlooking anything.
>>
>> It seems like we could support this with a few small changes:
>>
>> - Add the concept of tags to the Adaptor interface.
>> - AbstractAdator would support a getTags method which would return the
>> union of tags set on the Adaptor and the default tags on the
>> DataFactory.
>> - Internal tag implementations on each would change to store tags in
>> maps, instead of concat'ed strings. This would allow for a "last in
>> wins" type of functionality so tags could be overriden. This assumes
>> of course that there should never be more than one of the same tag key
>> value, which I _assume_ is the case.
>> - The ChunkImpl constructor will call getTags on the agent, instead of
>> getDefaultTags the data factory.
>>
>> The trickiest part as I see it is figuring out how to change the add
>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>> both makes sense and doesn't break things. In it's current form it
>> doesn't have room for easy expansion except at the end of the line:
>>
>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>> <initial offset>
>>
>> Any thoughts or suggestions? There's also a potential gotcha with all
>> the impls of Adaptor.parseArgs either breaking or needing to
>> change....
>>
>> thanks,
>> Bill
>>
>>
>
>
>

Re: Cluster-specific Adaptors

Posted by Eric Yang <ey...@yahoo-inc.com>.
Having to build state on top of Agent protocol is something that I would
avoid.  I prefer each command to be stateless.  It is better to re-factor
processAddCommand to ensure the simple and clean design can take place.  We
should look into new check_point file format etc.

It's better to do a quick pass on existing adaptors.  Most of this changes
can happen in AbstractAdaptor.  I wouldn't worry about breaking adaptors
because there are very few of them.  I like to make REST API version of
agent protocol standard though.  It is more clear from automation
perspective.

Regards,
Eric


On 9/20/10 5:32 PM, "Ariel Rabkin" <as...@gmail.com> wrote:

> I've been thinking about the adaptor add syntax for a while.  It's
> kinda clunky and hard to extend.
> 
> Here's something I've been considering.  Suppose instead of doing this
> in the add command, we had a separate setTags command that controls
> the tags for the next adaptor[s] started from that conf file or socket
> connection.
> 
> so you'd have stuff like
> settags cluster foo
> add .....
> 
> And then the adaptors wouldn't even have to know about it; you could
> handle this primarily on the framework side.
> 
> This would make the checkpointing somewhat more complex, but I think
> that's potentially not so bad to patch up.  You'd keep a list for each
> adaptor of supplemental commands that should be applied to it, and
> dump those out before checkpointing the adaptor state.
> 
> 
> --Ari
> 
> On Mon, Sep 20, 2010 at 5:15 PM, Bill Graham <bi...@gmail.com> wrote:
>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>> me. Plus, it would put the responsibility of parsing tags on each
>> adaptor impl and would require a refactor of how each one currently
>> parses args.
>> 
>> Actually, we might be able to intercept the call to parseArgs in
>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>> to the subclass, which would be none the wiser. Not the cleanest, but
>> at lease not as intrusive on the adaptor implementations.
>> 
>> Ari, also what about the getCurrentStatus() method? I'd think all the
>> impls would somehow need to incorporate tags into that response as
>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>> checkpoints file.
>> 
>> 
>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>>> Hi Bill,
>>> 
>>> This might be hacky but it should be possible to have adaptor specific
>>> params to include tags.  Ari, what do you think?
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> In CHUKWA-515 we discussed the possibility being able to add an
>>> adaptor bound to a given cluster:
>>> 
>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&p
>>> age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_
>>> 12905811
>>> 
>>> I can actually see this being useful, especially now that it's easier
>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>> it doesn't seem like it would be that hard to do, but I want to make
>>> sure I'm not overlooking anything.
>>> 
>>> It seems like we could support this with a few small changes:
>>> 
>>> - Add the concept of tags to the Adaptor interface.
>>> - AbstractAdator would support a getTags method which would return the
>>> union of tags set on the Adaptor and the default tags on the
>>> DataFactory.
>>> - Internal tag implementations on each would change to store tags in
>>> maps, instead of concat'ed strings. This would allow for a "last in
>>> wins" type of functionality so tags could be overriden. This assumes
>>> of course that there should never be more than one of the same tag key
>>> value, which I _assume_ is the case.
>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>> getDefaultTags the data factory.
>>> 
>>> The trickiest part as I see it is figuring out how to change the add
>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>> both makes sense and doesn't break things. In it's current form it
>>> doesn't have room for easy expansion except at the end of the line:
>>> 
>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>>> <initial offset>
>>> 
>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>> the impls of Adaptor.parseArgs either breaking or needing to
>>> change....
>>> 
>>> thanks,
>>> Bill
>>> 
>>> 
>> 
> 
> 
> 
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
> 


Re: Cluster-specific Adaptors

Posted by Ariel Rabkin <as...@gmail.com>.
I've been thinking about the adaptor add syntax for a while.  It's
kinda clunky and hard to extend.

Here's something I've been considering.  Suppose instead of doing this
in the add command, we had a separate setTags command that controls
the tags for the next adaptor[s] started from that conf file or socket
connection.

so you'd have stuff like
settags cluster foo
add .....

And then the adaptors wouldn't even have to know about it; you could
handle this primarily on the framework side.

This would make the checkpointing somewhat more complex, but I think
that's potentially not so bad to patch up.  You'd keep a list for each
adaptor of supplemental commands that should be applied to it, and
dump those out before checkpointing the adaptor state.


--Ari

On Mon, Sep 20, 2010 at 5:15 PM, Bill Graham <bi...@gmail.com> wrote:
> I'd like to hear Ari's take on this, but this does feel a bit hacky to
> me. Plus, it would put the responsibility of parsing tags on each
> adaptor impl and would require a refactor of how each one currently
> parses args.
>
> Actually, we might be able to intercept the call to parseArgs in
> AbstractAdaptor and pull out the tags if they exist and pass the rest
> to the subclass, which would be none the wiser. Not the cleanest, but
> at lease not as intrusive on the adaptor implementations.
>
> Ari, also what about the getCurrentStatus() method? I'd think all the
> impls would somehow need to incorporate tags into that response as
> well, since AFAIR that's what's used to do Adaptor SerDe with the
> checkpoints file.
>
>
> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>> Hi Bill,
>>
>> This might be hacky but it should be possible to have adaptor specific
>> params to include tags.  Ari, what do you think?
>>
>> Regards,
>> Eric
>>
>> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>>
>> Hi,
>>
>> In CHUKWA-515 we discussed the possibility being able to add an
>> adaptor bound to a given cluster:
>>
>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>
>> I can actually see this being useful, especially now that it's easier
>> to add/remove agents with the Adaptor REST API. Looking into the code
>> it doesn't seem like it would be that hard to do, but I want to make
>> sure I'm not overlooking anything.
>>
>> It seems like we could support this with a few small changes:
>>
>> - Add the concept of tags to the Adaptor interface.
>> - AbstractAdator would support a getTags method which would return the
>> union of tags set on the Adaptor and the default tags on the
>> DataFactory.
>> - Internal tag implementations on each would change to store tags in
>> maps, instead of concat'ed strings. This would allow for a "last in
>> wins" type of functionality so tags could be overriden. This assumes
>> of course that there should never be more than one of the same tag key
>> value, which I _assume_ is the case.
>> - The ChunkImpl constructor will call getTags on the agent, instead of
>> getDefaultTags the data factory.
>>
>> The trickiest part as I see it is figuring out how to change the add
>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>> both makes sense and doesn't break things. In it's current form it
>> doesn't have room for easy expansion except at the end of the line:
>>
>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>> <initial offset>
>>
>> Any thoughts or suggestions? There's also a potential gotcha with all
>> the impls of Adaptor.parseArgs either breaking or needing to
>> change....
>>
>> thanks,
>> Bill
>>
>>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: Cluster-specific Adaptors

Posted by Jerome Boulon <jb...@netflix.com>.
Hi,
If I had to implement this, I will add an extra parameter (?extraParams=xyz).
The adaptorImp will be the only one responsible for parsing this adaptor's specific info.
I don't think that we could/should add new complexity in the parsing.
The same think should be done for getCurrentStatus(), a public result, that is the same for all adaptors in order to know if the adaptors is working or not and a private section that will give extra information.

Also, moving to a json input should simplify everything.
/Jerome.

On 9/20/10 5:15 PM, "Bill Graham" <bi...@gmail.com> wrote:

I'd like to hear Ari's take on this, but this does feel a bit hacky to
me. Plus, it would put the responsibility of parsing tags on each
adaptor impl and would require a refactor of how each one currently
parses args.

Actually, we might be able to intercept the call to parseArgs in
AbstractAdaptor and pull out the tags if they exist and pass the rest
to the subclass, which would be none the wiser. Not the cleanest, but
at lease not as intrusive on the adaptor implementations.

Ari, also what about the getCurrentStatus() method? I'd think all the
impls would somehow need to incorporate tags into that response as
well, since AFAIR that's what's used to do Adaptor SerDe with the
checkpoints file.


On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
> Hi Bill,
>
> This might be hacky but it should be possible to have adaptor specific
> params to include tags.  Ari, what do you think?
>
> Regards,
> Eric
>
> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> Hi,
>
> In CHUKWA-515 we discussed the possibility being able to add an
> adaptor bound to a given cluster:
>
> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>
> I can actually see this being useful, especially now that it's easier
> to add/remove agents with the Adaptor REST API. Looking into the code
> it doesn't seem like it would be that hard to do, but I want to make
> sure I'm not overlooking anything.
>
> It seems like we could support this with a few small changes:
>
> - Add the concept of tags to the Adaptor interface.
> - AbstractAdator would support a getTags method which would return the
> union of tags set on the Adaptor and the default tags on the
> DataFactory.
> - Internal tag implementations on each would change to store tags in
> maps, instead of concat'ed strings. This would allow for a "last in
> wins" type of functionality so tags could be overriden. This assumes
> of course that there should never be more than one of the same tag key
> value, which I _assume_ is the case.
> - The ChunkImpl constructor will call getTags on the agent, instead of
> getDefaultTags the data factory.
>
> The trickiest part as I see it is figuring out how to change the add
> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
> both makes sense and doesn't break things. In it's current form it
> doesn't have room for easy expansion except at the end of the line:
>
> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
> <initial offset>
>
> Any thoughts or suggestions? There's also a potential gotcha with all
> the impls of Adaptor.parseArgs either breaking or needing to
> change....
>
> thanks,
> Bill
>
>



Re: Cluster-specific Adaptors

Posted by Bill Graham <bi...@gmail.com>.
I'd like to hear Ari's take on this, but this does feel a bit hacky to
me. Plus, it would put the responsibility of parsing tags on each
adaptor impl and would require a refactor of how each one currently
parses args.

Actually, we might be able to intercept the call to parseArgs in
AbstractAdaptor and pull out the tags if they exist and pass the rest
to the subclass, which would be none the wiser. Not the cleanest, but
at lease not as intrusive on the adaptor implementations.

Ari, also what about the getCurrentStatus() method? I'd think all the
impls would somehow need to incorporate tags into that response as
well, since AFAIR that's what's used to do Adaptor SerDe with the
checkpoints file.


On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
> Hi Bill,
>
> This might be hacky but it should be possible to have adaptor specific
> params to include tags.  Ari, what do you think?
>
> Regards,
> Eric
>
> On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:
>
> Hi,
>
> In CHUKWA-515 we discussed the possibility being able to add an
> adaptor bound to a given cluster:
>
> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>
> I can actually see this being useful, especially now that it's easier
> to add/remove agents with the Adaptor REST API. Looking into the code
> it doesn't seem like it would be that hard to do, but I want to make
> sure I'm not overlooking anything.
>
> It seems like we could support this with a few small changes:
>
> - Add the concept of tags to the Adaptor interface.
> - AbstractAdator would support a getTags method which would return the
> union of tags set on the Adaptor and the default tags on the
> DataFactory.
> - Internal tag implementations on each would change to store tags in
> maps, instead of concat'ed strings. This would allow for a "last in
> wins" type of functionality so tags could be overriden. This assumes
> of course that there should never be more than one of the same tag key
> value, which I _assume_ is the case.
> - The ChunkImpl constructor will call getTags on the agent, instead of
> getDefaultTags the data factory.
>
> The trickiest part as I see it is figuring out how to change the add
> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
> both makes sense and doesn't break things. In it's current form it
> doesn't have room for easy expansion except at the end of the line:
>
> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
> <initial offset>
>
> Any thoughts or suggestions? There's also a potential gotcha with all
> the impls of Adaptor.parseArgs either breaking or needing to
> change....
>
> thanks,
> Bill
>
>

Re: Cluster-specific Adaptors

Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi Bill,

This might be hacky but it should be possible to have adaptor specific params to include tags.  Ari, what do you think?

Regards,
Eric

On 9/20/10 2:58 PM, "Bill Graham" <bi...@gmail.com> wrote:

Hi,

In CHUKWA-515 we discussed the possibility being able to add an
adaptor bound to a given cluster:

https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811

I can actually see this being useful, especially now that it's easier
to add/remove agents with the Adaptor REST API. Looking into the code
it doesn't seem like it would be that hard to do, but I want to make
sure I'm not overlooking anything.

It seems like we could support this with a few small changes:

- Add the concept of tags to the Adaptor interface.
- AbstractAdator would support a getTags method which would return the
union of tags set on the Adaptor and the default tags on the
DataFactory.
- Internal tag implementations on each would change to store tags in
maps, instead of concat'ed strings. This would allow for a "last in
wins" type of functionality so tags could be overriden. This assumes
of course that there should never be more than one of the same tag key
value, which I _assume_ is the case.
- The ChunkImpl constructor will call getTags on the agent, instead of
getDefaultTags the data factory.

The trickiest part as I see it is figuring out how to change the add
adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
both makes sense and doesn't break things. In it's current form it
doesn't have room for easy expansion except at the end of the line:

add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
<initial offset>

Any thoughts or suggestions? There's also a potential gotcha with all
the impls of Adaptor.parseArgs either breaking or needing to
change....

thanks,
Bill