You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2015/02/06 20:11:38 UTC

Question on nullEnvelop

Could you explain why consumerMultiplexer.choose returns null?

Can it happen when there's no message in the kafka topic?

If my theory is correct, its frequency is too high, in my testing
environment, it's more than 50 per second.

Thank you
Best, Jae

Re: Question on nullEnvelop

Posted by Zach Cox <zc...@gmail.com>.
I just added a comment to https://issues.apache.org/jira/browse/SAMZA-506
with details on our current approach to clean shutdown in Samza 0.8.0,
hopefully it's useful to others.


On Fri, Feb 6, 2015 at 2:39 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jae,
>
> > If so, what's the best way to shutdown the container without using
> command
> topic?
>
> YARN does send a SIGTERM before SIGKILL. The config in YARN to set the
> latency is here:
>
>   yarn.nodemanager.sleep-delay-before-sigkill.ms
>
> The default is 250ms. Samza does *not* currently handle the SIGTERM
> gracefully (it doesn't shut itself down). The ticket to do this is here:
>
>   https://issues.apache.org/jira/browse/SAMZA-506
>
> If you'd like to work on that patch, that should make it work. If not, yes,
> you'll have to use some form of a shutdown command. Zach (the guy who
> opened the JIRA) was able to hack around this himself by adding a shutdown
> hook. You could do something similar, if you want: add a shutdown hook that
> sets a variable, have window() check the variable ever N ms, and call
> coordinator.shutdown if it's set to true. You'd probably also have to raise
> the delay to more than 250ms in YARN.
>
> Options:
>
> 1. Use a topic like samza_command.
> 2. Fix SAMZA-506.
> 3. Write a custom shutdown hook with a static variable.
>
> >  Does it hurt overall processing performance? I don't think so, but I
> want to confirm.
>
> Nope, shouldn't. It only sleeps during "idle" time (no messages available).
> When there are messages available, you shouldn't get null_envelopes (unless
> you have a custom MessageChooser that withholds available messages, which I
> doubt you do).
>
> Cheers,
> Chris
>
> On Fri, Feb 6, 2015 at 12:30 PM, Bae, Jae Hyeon <me...@gmail.com>
> wrote:
>
> > What I am doing is, consuming two topics, samza_input and samza_command.
> > samza_command will have some control command something like
> "shutdown,all"
> > because kill-yarn-job.sh does not gracefully shutdown SamzaContainer. Am
> I
> > correct? If so, what's the best way to shutdown the container without
> using
> > command topic?
> >
> > 10ms explains why 50 null envelops were consumed per second. Does it hurt
> > overall processing performance? I don't think so, but I want to confirm.
> >
> > Thank you
> > Best, Jae
> >
> > On Fri, Feb 6, 2015 at 12:16 PM, Chris Riccomini <cr...@apache.org>
> > wrote:
> >
> > > Hey Jae,
> > >
> > > SamzaContainer polls for new messages by calling
> > > consumerMultiplexer.choose. In a case where there are no messages
> > > available, choose will return null. The next time choose is called, it
> > will
> > > be invoked with a timeout (the default is 10ms). This time, the poll
> call
> > > will block until 1) the timeout is hit 2) there is a new message
> > available
> > > to process. This is to prevent a tight loop.
> > >
> > > > its frequency is too high, in my testing environment, it's more than
> 50
> > > per second.
> > >
> > > Why do you think this is too high? It either has to do this, or sleep
> for
> > > longer. The longer the container sleeps, the more latency that's
> > introduced
> > > when there *is* a message available. 10ms is what we use by default.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <me...@gmail.com>
> > > wrote:
> > >
> > > > Could you explain why consumerMultiplexer.choose returns null?
> > > >
> > > > Can it happen when there's no message in the kafka topic?
> > > >
> > > > If my theory is correct, its frequency is too high, in my testing
> > > > environment, it's more than 50 per second.
> > > >
> > > > Thank you
> > > > Best, Jae
> > > >
> > >
> >
>

Re: Question on nullEnvelop

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jae,

> If so, what's the best way to shutdown the container without using command
topic?

YARN does send a SIGTERM before SIGKILL. The config in YARN to set the
latency is here:

  yarn.nodemanager.sleep-delay-before-sigkill.ms

The default is 250ms. Samza does *not* currently handle the SIGTERM
gracefully (it doesn't shut itself down). The ticket to do this is here:

  https://issues.apache.org/jira/browse/SAMZA-506

If you'd like to work on that patch, that should make it work. If not, yes,
you'll have to use some form of a shutdown command. Zach (the guy who
opened the JIRA) was able to hack around this himself by adding a shutdown
hook. You could do something similar, if you want: add a shutdown hook that
sets a variable, have window() check the variable ever N ms, and call
coordinator.shutdown if it's set to true. You'd probably also have to raise
the delay to more than 250ms in YARN.

Options:

1. Use a topic like samza_command.
2. Fix SAMZA-506.
3. Write a custom shutdown hook with a static variable.

>  Does it hurt overall processing performance? I don't think so, but I
want to confirm.

Nope, shouldn't. It only sleeps during "idle" time (no messages available).
When there are messages available, you shouldn't get null_envelopes (unless
you have a custom MessageChooser that withholds available messages, which I
doubt you do).

Cheers,
Chris

On Fri, Feb 6, 2015 at 12:30 PM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> What I am doing is, consuming two topics, samza_input and samza_command.
> samza_command will have some control command something like "shutdown,all"
> because kill-yarn-job.sh does not gracefully shutdown SamzaContainer. Am I
> correct? If so, what's the best way to shutdown the container without using
> command topic?
>
> 10ms explains why 50 null envelops were consumed per second. Does it hurt
> overall processing performance? I don't think so, but I want to confirm.
>
> Thank you
> Best, Jae
>
> On Fri, Feb 6, 2015 at 12:16 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Hey Jae,
> >
> > SamzaContainer polls for new messages by calling
> > consumerMultiplexer.choose. In a case where there are no messages
> > available, choose will return null. The next time choose is called, it
> will
> > be invoked with a timeout (the default is 10ms). This time, the poll call
> > will block until 1) the timeout is hit 2) there is a new message
> available
> > to process. This is to prevent a tight loop.
> >
> > > its frequency is too high, in my testing environment, it's more than 50
> > per second.
> >
> > Why do you think this is too high? It either has to do this, or sleep for
> > longer. The longer the container sleeps, the more latency that's
> introduced
> > when there *is* a message available. 10ms is what we use by default.
> >
> > Cheers,
> > Chris
> >
> > On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <me...@gmail.com>
> > wrote:
> >
> > > Could you explain why consumerMultiplexer.choose returns null?
> > >
> > > Can it happen when there's no message in the kafka topic?
> > >
> > > If my theory is correct, its frequency is too high, in my testing
> > > environment, it's more than 50 per second.
> > >
> > > Thank you
> > > Best, Jae
> > >
> >
>

Re: Question on nullEnvelop

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.
What I am doing is, consuming two topics, samza_input and samza_command.
samza_command will have some control command something like "shutdown,all"
because kill-yarn-job.sh does not gracefully shutdown SamzaContainer. Am I
correct? If so, what's the best way to shutdown the container without using
command topic?

10ms explains why 50 null envelops were consumed per second. Does it hurt
overall processing performance? I don't think so, but I want to confirm.

Thank you
Best, Jae

On Fri, Feb 6, 2015 at 12:16 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jae,
>
> SamzaContainer polls for new messages by calling
> consumerMultiplexer.choose. In a case where there are no messages
> available, choose will return null. The next time choose is called, it will
> be invoked with a timeout (the default is 10ms). This time, the poll call
> will block until 1) the timeout is hit 2) there is a new message available
> to process. This is to prevent a tight loop.
>
> > its frequency is too high, in my testing environment, it's more than 50
> per second.
>
> Why do you think this is too high? It either has to do this, or sleep for
> longer. The longer the container sleeps, the more latency that's introduced
> when there *is* a message available. 10ms is what we use by default.
>
> Cheers,
> Chris
>
> On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <me...@gmail.com>
> wrote:
>
> > Could you explain why consumerMultiplexer.choose returns null?
> >
> > Can it happen when there's no message in the kafka topic?
> >
> > If my theory is correct, its frequency is too high, in my testing
> > environment, it's more than 50 per second.
> >
> > Thank you
> > Best, Jae
> >
>

Re: Question on nullEnvelop

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jae,

SamzaContainer polls for new messages by calling
consumerMultiplexer.choose. In a case where there are no messages
available, choose will return null. The next time choose is called, it will
be invoked with a timeout (the default is 10ms). This time, the poll call
will block until 1) the timeout is hit 2) there is a new message available
to process. This is to prevent a tight loop.

> its frequency is too high, in my testing environment, it's more than 50
per second.

Why do you think this is too high? It either has to do this, or sleep for
longer. The longer the container sleeps, the more latency that's introduced
when there *is* a message available. 10ms is what we use by default.

Cheers,
Chris

On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Could you explain why consumerMultiplexer.choose returns null?
>
> Can it happen when there's no message in the kafka topic?
>
> If my theory is correct, its frequency is too high, in my testing
> environment, it's more than 50 per second.
>
> Thank you
> Best, Jae
>