You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@twill.apache.org by Сергей Филиппов <ro...@gmail.com> on 2017/07/01 08:46:37 UTC

External Kafka server for log aggregation

Hello,
I would like to implement possibility to use external kafka server for log
aggregation.
Now twill uses EmbededKafkaServer for that. I think implementation would
look like this:
1. Add ZK path where kafka zk connection string will be stored. There
should should be only one  such path per ApplicationMaster
2. Use this path in ApplicationKafkaService while creating
EmbededKafkaService, if there on brokers right now
3. For log aggregation there should be additional nodes in ZK for each
instance with kafka topic's name in it. Something like
"test-app-{UUID}-log". So publisher will send to this topic and consumer
will consume log messages on the job submission machine.

What would you say? Is this sounds ok?

Sergey

Re: External Kafka server for log aggregation

Posted by Сергей Филиппов <ro...@gmail.com>.

My email for apache JIRA is: firstrolenof@yandex.ru

Sergey

вт, 4 июл. 2017 г. в 8:33, Terence Yim <ch...@gmail.com>:

> Sure, you are welcome to do so. You can also update the JIRA with your
> proposed change before sending the PR.
>
> What is your Apache JIRA email? I have to add you to the twill project
> before I can assign the JIRA to you.
>
> Terence
>
> Sent from my iPhone
>
> > On Jul 3, 2017, at 12:57 AM, Сергей Филиппов <ro...@gmail.com> wrote:
> >
> > Hi, Terence,
> > Thank you for description! Should I assign this issue to me?
> >
> >
> > Sergey
> >
> > сб, 1 июл. 2017 г. в 21:23, Terence Yim <ch...@gmail.com>:
> >
> >> Hi Sergey,
> >>
> >> I think you are talking about TWILL-147 (
> >> https://issues.apache.org/jira/browse/TWILL-147), right? The idea for
> that
> >> is we don't need to start EmbeddedKafkaServer in AM at all, but rather
> it
> >> just take a configuration (via TwillPreparer, which can have a default
> >> value in the Configuration object passed to YarnTwillRunnerService),
> which
> >> the configuration specifies the Kafka broker list and topic that the AM
> >> will publish to.
> >>
> >> Since under this model, application logs from different application may
> >> send to the same Kafka topic (depends on the configuration), the
> LogEntry
> >> needs to be modified to carry the application and run id, so that the
> >> TwillController can filter based on it on the client side.
> >>
> >> Terence
> >>
> >>
> >>
> >>> On Sat, Jul 1, 2017 at 1:46 AM, Сергей Филиппов <ro...@gmail.com>
> wrote:
> >>>
> >>> Hello,
> >>> I would like to implement possibility to use external kafka server for
> >> log
> >>> aggregation.
> >>> Now twill uses EmbededKafkaServer for that. I think implementation
> would
> >>> look like this:
> >>> 1. Add ZK path where kafka zk connection string will be stored. There
> >>> should should be only one  such path per ApplicationMaster
> >>> 2. Use this path in ApplicationKafkaService while creating
> >>> EmbededKafkaService, if there on brokers right now
> >>> 3. For log aggregation there should be additional nodes in ZK for each
> >>> instance with kafka topic's name in it. Something like
> >>> "test-app-{UUID}-log". So publisher will send to this topic and
> consumer
> >>> will consume log messages on the job submission machine.
> >>>
> >>> What would you say? Is this sounds ok?
> >>>
> >>> Sergey
> >>>
> >>
>

Re: External Kafka server for log aggregation

Posted by Terence Yim <ch...@gmail.com>.

Sure, you are welcome to do so. You can also update the JIRA with your proposed change before sending the PR.

What is your Apache JIRA email? I have to add you to the twill project before I can assign the JIRA to you.

Terence

Sent from my iPhone

> On Jul 3, 2017, at 12:57 AM, Сергей Филиппов <ro...@gmail.com> wrote:
> 
> Hi, Terence,
> Thank you for description! Should I assign this issue to me?
> 
> 
> Sergey
> 
> сб, 1 июл. 2017 г. в 21:23, Terence Yim <ch...@gmail.com>:
> 
>> Hi Sergey,
>> 
>> I think you are talking about TWILL-147 (
>> https://issues.apache.org/jira/browse/TWILL-147), right? The idea for that
>> is we don't need to start EmbeddedKafkaServer in AM at all, but rather it
>> just take a configuration (via TwillPreparer, which can have a default
>> value in the Configuration object passed to YarnTwillRunnerService), which
>> the configuration specifies the Kafka broker list and topic that the AM
>> will publish to.
>> 
>> Since under this model, application logs from different application may
>> send to the same Kafka topic (depends on the configuration), the LogEntry
>> needs to be modified to carry the application and run id, so that the
>> TwillController can filter based on it on the client side.
>> 
>> Terence
>> 
>> 
>> 
>>> On Sat, Jul 1, 2017 at 1:46 AM, Сергей Филиппов <ro...@gmail.com> wrote:
>>> 
>>> Hello,
>>> I would like to implement possibility to use external kafka server for
>> log
>>> aggregation.
>>> Now twill uses EmbededKafkaServer for that. I think implementation would
>>> look like this:
>>> 1. Add ZK path where kafka zk connection string will be stored. There
>>> should should be only one  such path per ApplicationMaster
>>> 2. Use this path in ApplicationKafkaService while creating
>>> EmbededKafkaService, if there on brokers right now
>>> 3. For log aggregation there should be additional nodes in ZK for each
>>> instance with kafka topic's name in it. Something like
>>> "test-app-{UUID}-log". So publisher will send to this topic and consumer
>>> will consume log messages on the job submission machine.
>>> 
>>> What would you say? Is this sounds ok?
>>> 
>>> Sergey
>>> 
>>

Re: External Kafka server for log aggregation

Posted by Сергей Филиппов <ro...@gmail.com>.

Hi, Terence,
Thank you for description! Should I assign this issue to me?


Sergey

сб, 1 июл. 2017 г. в 21:23, Terence Yim <ch...@gmail.com>:

> Hi Sergey,
>
> I think you are talking about TWILL-147 (
> https://issues.apache.org/jira/browse/TWILL-147), right? The idea for that
> is we don't need to start EmbeddedKafkaServer in AM at all, but rather it
> just take a configuration (via TwillPreparer, which can have a default
> value in the Configuration object passed to YarnTwillRunnerService), which
> the configuration specifies the Kafka broker list and topic that the AM
> will publish to.
>
> Since under this model, application logs from different application may
> send to the same Kafka topic (depends on the configuration), the LogEntry
> needs to be modified to carry the application and run id, so that the
> TwillController can filter based on it on the client side.
>
> Terence
>
>
>
> On Sat, Jul 1, 2017 at 1:46 AM, Сергей Филиппов <ro...@gmail.com> wrote:
>
> > Hello,
> > I would like to implement possibility to use external kafka server for
> log
> > aggregation.
> > Now twill uses EmbededKafkaServer for that. I think implementation would
> > look like this:
> > 1. Add ZK path where kafka zk connection string will be stored. There
> > should should be only one  such path per ApplicationMaster
> > 2. Use this path in ApplicationKafkaService while creating
> > EmbededKafkaService, if there on brokers right now
> > 3. For log aggregation there should be additional nodes in ZK for each
> > instance with kafka topic's name in it. Something like
> > "test-app-{UUID}-log". So publisher will send to this topic and consumer
> > will consume log messages on the job submission machine.
> >
> > What would you say? Is this sounds ok?
> >
> > Sergey
> >
>

Re: External Kafka server for log aggregation

Posted by Terence Yim <ch...@gmail.com>.

Hi Sergey,

I think you are talking about TWILL-147 (
https://issues.apache.org/jira/browse/TWILL-147), right? The idea for that
is we don't need to start EmbeddedKafkaServer in AM at all, but rather it
just take a configuration (via TwillPreparer, which can have a default
value in the Configuration object passed to YarnTwillRunnerService), which
the configuration specifies the Kafka broker list and topic that the AM
will publish to.

Since under this model, application logs from different application may
send to the same Kafka topic (depends on the configuration), the LogEntry
needs to be modified to carry the application and run id, so that the
TwillController can filter based on it on the client side.

Terence

On Sat, Jul 1, 2017 at 1:46 AM, Сергей Филиппов <ro...@gmail.com> wrote:

> Hello,
> I would like to implement possibility to use external kafka server for log
> aggregation.
> Now twill uses EmbededKafkaServer for that. I think implementation would
> look like this:
> 1. Add ZK path where kafka zk connection string will be stored. There
> should should be only one  such path per ApplicationMaster
> 2. Use this path in ApplicationKafkaService while creating
> EmbededKafkaService, if there on brokers right now
> 3. For log aggregation there should be additional nodes in ZK for each
> instance with kafka topic's name in it. Something like
> "test-app-{UUID}-log". So publisher will send to this topic and consumer
> will consume log messages on the job submission machine.
>
> What would you say? Is this sounds ok?
>
> Sergey
>