You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by John Dennison <de...@gmail.com> on 2016/02/10 20:20:37 UTC

Zombie writers protection

Greetings,

I have general design question i did not see addressed in the docs.
Basically how does samza guarantee a single writer for each changelog
partition. Because of strong ordering assumption of these changelog, how do
you protect against zombie processes writing to the changelog with out of
date values.

Thanks,

John

Re: Zombie writers protection

Posted by Yi Pan <ni...@gmail.com>.
Hi, Rick and John,

Thanks for the great discussion! As Jacob said, we realized the possible
drawbacks relying solely on YARN for process liveness detection as well and
that's why SAMZA-871 was opened. Please help to comment on the JIRA so that
we can track the discussion and move the design process forward.

Thanks a lot!

-Yi

On Wed, Feb 10, 2016 at 2:10 PM, Rick Mangi <ri...@chartbeat.com> wrote:

> Jake, Not my question, I was just adding my 2 cents :)
>
> John, it’s not that yarn is responsible for maintaining 1 instance of each
> container, samza has an abstract management layer that defers this to yarn,
> but some people bypass yarn all together and manage their containers
> themselves or run on things like mesos.
>
> For your purposes though, if you are using yarn, then yes this is yarn’s
> job.
>
> The case I ran into was with cloudera’s distro of yarn with an older
> version of ubuntu and yarn. I haven’t seen zombies since moving to the
> latest yarn distro.
>
>
>
> > On Feb 10, 2016, at 4:44 PM, Jacob Maes <ja...@gmail.com> wrote:
> >
> > Hey Rick,
> >
> > If I understand your question, the goal is really to make sure there are
> no
> > orphaned containers that continue to run "off the books".
> >
> > The newly added SAMZA-871 describes a heart beat mechanism to make sure
> > orphaned containers actually get killed.
> >
> > Also, the YARN Node Manager Restart capability might help. We're in the
> > process of testing this at LinkedIn:
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
> >
> > -Jake
> >
> > On Wed, Feb 10, 2016 at 1:42 PM, John Dennison <de...@gmail.com>
> > wrote:
> >
> >> To second Rick's point. Its less about malicious actors, but rather
> >> containers thought to be lost due to a network partition popping up
> later
> >> and starting to write to the change log. I assume from Rick's response
> that
> >> yarn is responsible for ensure only one version of each container is
> >> running and samza has nothing internal to deal with this.
> >>
> >> I guess you could hijack kafka's auth framework to block old zombie
> >> containers from writing. Use some global lock's incrementing token as
> the
> >> password. A zombie process would auth with an old token and be denied. I
> >> haven't looked but i imagine that 0.9.0 auth framework isn't done on a
> >> partition level.
> >>
> >> On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <ri...@chartbeat.com> wrote:
> >>
> >>> Security wouldn’t stop zombie processes from writing to kafka. I had
> this
> >>> problem with yarn before where the container thought it was killing
> jobs
> >>> but they never actually died, and in fact continued to write to kafka.
> >>>
> >>>
> >>>> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
> >>> jagadish1989@gmail.com> wrote:
> >>>>
> >>>> Hi John
> >>>>
> >>>> Currently there is no authorization on who writes to Kafka. There is a
> >>>> Kafka security proposal that the kafka community is working on.
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
> >>>>
> >>>> Building this into Samza may entail expensive coordination (to prevent
> >>>> other jobs). Since, jobs are usually run in a trusted environment,
> I've
> >>> not
> >>>> seen people requesting this use-case. Even if we did build this into
> >>> Samza,
> >>>> nothing stops people from writing to that Kafka topic by bypassing
> >> Samza
> >>>> completely. (thro' the kafka producer or external library)
> >>>>
> >>>> I'd think Kafka would build support for authorization, principals,
> >> roles
> >>>> etc. in the future and Samza can leverage it once it's done.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> On Wednesday, February 10, 2016, John Dennison <
> >> dennison.john@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Greetings,
> >>>>>
> >>>>> I have general design question i did not see addressed in the docs.
> >>>>> Basically how does samza guarantee a single writer for each changelog
> >>>>> partition. Because of strong ordering assumption of these changelog,
> >>> how do
> >>>>> you protect against zombie processes writing to the changelog with
> out
> >>> of
> >>>>> date values.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> John
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Zombie writers protection

Posted by Rick Mangi <ri...@chartbeat.com>.
Jake, Not my question, I was just adding my 2 cents :)

John, it’s not that yarn is responsible for maintaining 1 instance of each container, samza has an abstract management layer that defers this to yarn, but some people bypass yarn all together and manage their containers themselves or run on things like mesos.

For your purposes though, if you are using yarn, then yes this is yarn’s job.

The case I ran into was with cloudera’s distro of yarn with an older version of ubuntu and yarn. I haven’t seen zombies since moving to the latest yarn distro.



> On Feb 10, 2016, at 4:44 PM, Jacob Maes <ja...@gmail.com> wrote:
> 
> Hey Rick,
> 
> If I understand your question, the goal is really to make sure there are no
> orphaned containers that continue to run "off the books".
> 
> The newly added SAMZA-871 describes a heart beat mechanism to make sure
> orphaned containers actually get killed.
> 
> Also, the YARN Node Manager Restart capability might help. We're in the
> process of testing this at LinkedIn:
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
> 
> -Jake
> 
> On Wed, Feb 10, 2016 at 1:42 PM, John Dennison <de...@gmail.com>
> wrote:
> 
>> To second Rick's point. Its less about malicious actors, but rather
>> containers thought to be lost due to a network partition popping up later
>> and starting to write to the change log. I assume from Rick's response that
>> yarn is responsible for ensure only one version of each container is
>> running and samza has nothing internal to deal with this.
>> 
>> I guess you could hijack kafka's auth framework to block old zombie
>> containers from writing. Use some global lock's incrementing token as the
>> password. A zombie process would auth with an old token and be denied. I
>> haven't looked but i imagine that 0.9.0 auth framework isn't done on a
>> partition level.
>> 
>> On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <ri...@chartbeat.com> wrote:
>> 
>>> Security wouldn’t stop zombie processes from writing to kafka. I had this
>>> problem with yarn before where the container thought it was killing jobs
>>> but they never actually died, and in fact continued to write to kafka.
>>> 
>>> 
>>>> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
>>> jagadish1989@gmail.com> wrote:
>>>> 
>>>> Hi John
>>>> 
>>>> Currently there is no authorization on who writes to Kafka. There is a
>>>> Kafka security proposal that the kafka community is working on.
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>> 
>>>> Building this into Samza may entail expensive coordination (to prevent
>>>> other jobs). Since, jobs are usually run in a trusted environment, I've
>>> not
>>>> seen people requesting this use-case. Even if we did build this into
>>> Samza,
>>>> nothing stops people from writing to that Kafka topic by bypassing
>> Samza
>>>> completely. (thro' the kafka producer or external library)
>>>> 
>>>> I'd think Kafka would build support for authorization, principals,
>> roles
>>>> etc. in the future and Samza can leverage it once it's done.
>>>> 
>>>> Thoughts?
>>>> 
>>>> On Wednesday, February 10, 2016, John Dennison <
>> dennison.john@gmail.com>
>>>> wrote:
>>>> 
>>>>> Greetings,
>>>>> 
>>>>> I have general design question i did not see addressed in the docs.
>>>>> Basically how does samza guarantee a single writer for each changelog
>>>>> partition. Because of strong ordering assumption of these changelog,
>>> how do
>>>>> you protect against zombie processes writing to the changelog with out
>>> of
>>>>> date values.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> John
>>>>> 
>>> 
>>> 
>> 


Re: Zombie writers protection

Posted by Jacob Maes <ja...@gmail.com>.
Hey Rick,

If I understand your question, the goal is really to make sure there are no
orphaned containers that continue to run "off the books".

The newly added SAMZA-871 describes a heart beat mechanism to make sure
orphaned containers actually get killed.

Also, the YARN Node Manager Restart capability might help. We're in the
process of testing this at LinkedIn:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html

-Jake

On Wed, Feb 10, 2016 at 1:42 PM, John Dennison <de...@gmail.com>
wrote:

> To second Rick's point. Its less about malicious actors, but rather
> containers thought to be lost due to a network partition popping up later
> and starting to write to the change log. I assume from Rick's response that
> yarn is responsible for ensure only one version of each container is
> running and samza has nothing internal to deal with this.
>
> I guess you could hijack kafka's auth framework to block old zombie
> containers from writing. Use some global lock's incrementing token as the
> password. A zombie process would auth with an old token and be denied. I
> haven't looked but i imagine that 0.9.0 auth framework isn't done on a
> partition level.
>
> On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <ri...@chartbeat.com> wrote:
>
> > Security wouldn’t stop zombie processes from writing to kafka. I had this
> > problem with yarn before where the container thought it was killing jobs
> > but they never actually died, and in fact continued to write to kafka.
> >
> >
> > > On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
> > jagadish1989@gmail.com> wrote:
> > >
> > > Hi John
> > >
> > > Currently there is no authorization on who writes to Kafka. There is a
> > > Kafka security proposal that the kafka community is working on.
> > > https://cwiki.apache.org/confluence/display/KAFKA/Security
> > >
> > > Building this into Samza may entail expensive coordination (to prevent
> > > other jobs). Since, jobs are usually run in a trusted environment, I've
> > not
> > > seen people requesting this use-case. Even if we did build this into
> > Samza,
> > > nothing stops people from writing to that Kafka topic by bypassing
> Samza
> > > completely. (thro' the kafka producer or external library)
> > >
> > > I'd think Kafka would build support for authorization, principals,
> roles
> > > etc. in the future and Samza can leverage it once it's done.
> > >
> > > Thoughts?
> > >
> > > On Wednesday, February 10, 2016, John Dennison <
> dennison.john@gmail.com>
> > > wrote:
> > >
> > >> Greetings,
> > >>
> > >> I have general design question i did not see addressed in the docs.
> > >> Basically how does samza guarantee a single writer for each changelog
> > >> partition. Because of strong ordering assumption of these changelog,
> > how do
> > >> you protect against zombie processes writing to the changelog with out
> > of
> > >> date values.
> > >>
> > >> Thanks,
> > >>
> > >> John
> > >>
> >
> >
>

Re: Zombie writers protection

Posted by John Dennison <de...@gmail.com>.
To second Rick's point. Its less about malicious actors, but rather
containers thought to be lost due to a network partition popping up later
and starting to write to the change log. I assume from Rick's response that
yarn is responsible for ensure only one version of each container is
running and samza has nothing internal to deal with this.

I guess you could hijack kafka's auth framework to block old zombie
containers from writing. Use some global lock's incrementing token as the
password. A zombie process would auth with an old token and be denied. I
haven't looked but i imagine that 0.9.0 auth framework isn't done on a
partition level.

On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <ri...@chartbeat.com> wrote:

> Security wouldn’t stop zombie processes from writing to kafka. I had this
> problem with yarn before where the container thought it was killing jobs
> but they never actually died, and in fact continued to write to kafka.
>
>
> > On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
> jagadish1989@gmail.com> wrote:
> >
> > Hi John
> >
> > Currently there is no authorization on who writes to Kafka. There is a
> > Kafka security proposal that the kafka community is working on.
> > https://cwiki.apache.org/confluence/display/KAFKA/Security
> >
> > Building this into Samza may entail expensive coordination (to prevent
> > other jobs). Since, jobs are usually run in a trusted environment, I've
> not
> > seen people requesting this use-case. Even if we did build this into
> Samza,
> > nothing stops people from writing to that Kafka topic by bypassing Samza
> > completely. (thro' the kafka producer or external library)
> >
> > I'd think Kafka would build support for authorization, principals, roles
> > etc. in the future and Samza can leverage it once it's done.
> >
> > Thoughts?
> >
> > On Wednesday, February 10, 2016, John Dennison <de...@gmail.com>
> > wrote:
> >
> >> Greetings,
> >>
> >> I have general design question i did not see addressed in the docs.
> >> Basically how does samza guarantee a single writer for each changelog
> >> partition. Because of strong ordering assumption of these changelog,
> how do
> >> you protect against zombie processes writing to the changelog with out
> of
> >> date values.
> >>
> >> Thanks,
> >>
> >> John
> >>
>
>

Re: Zombie writers protection

Posted by Rick Mangi <ri...@chartbeat.com>.
Security wouldn’t stop zombie processes from writing to kafka. I had this problem with yarn before where the container thought it was killing jobs but they never actually died, and in fact continued to write to kafka.


> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <ja...@gmail.com> wrote:
> 
> Hi John
> 
> Currently there is no authorization on who writes to Kafka. There is a
> Kafka security proposal that the kafka community is working on.
> https://cwiki.apache.org/confluence/display/KAFKA/Security
> 
> Building this into Samza may entail expensive coordination (to prevent
> other jobs). Since, jobs are usually run in a trusted environment, I've not
> seen people requesting this use-case. Even if we did build this into Samza,
> nothing stops people from writing to that Kafka topic by bypassing Samza
> completely. (thro' the kafka producer or external library)
> 
> I'd think Kafka would build support for authorization, principals, roles
> etc. in the future and Samza can leverage it once it's done.
> 
> Thoughts?
> 
> On Wednesday, February 10, 2016, John Dennison <de...@gmail.com>
> wrote:
> 
>> Greetings,
>> 
>> I have general design question i did not see addressed in the docs.
>> Basically how does samza guarantee a single writer for each changelog
>> partition. Because of strong ordering assumption of these changelog, how do
>> you protect against zombie processes writing to the changelog with out of
>> date values.
>> 
>> Thanks,
>> 
>> John
>> 


Re: Zombie writers protection

Posted by Jagadish Venkatraman <ja...@gmail.com>.
Hi John

Currently there is no authorization on who writes to Kafka. There is a
Kafka security proposal that the kafka community is working on.
https://cwiki.apache.org/confluence/display/KAFKA/Security

Building this into Samza may entail expensive coordination (to prevent
other jobs). Since, jobs are usually run in a trusted environment, I've not
seen people requesting this use-case. Even if we did build this into Samza,
nothing stops people from writing to that Kafka topic by bypassing Samza
completely. (thro' the kafka producer or external library)

I'd think Kafka would build support for authorization, principals, roles
etc. in the future and Samza can leverage it once it's done.

Thoughts?

On Wednesday, February 10, 2016, John Dennison <de...@gmail.com>
wrote:

> Greetings,
>
> I have general design question i did not see addressed in the docs.
> Basically how does samza guarantee a single writer for each changelog
> partition. Because of strong ordering assumption of these changelog, how do
> you protect against zombie processes writing to the changelog with out of
> date values.
>
> Thanks,
>
> John
>