You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Elder, Catherine" <Ca...@viasat.com> on 2014/10/05 02:06:26 UTC

storm spout reading from head of stream dies after a few minutes

Hi everyone,

I have a kafka-storm problem where the kafka spout stops sending tuples after a few minutes. Any advice would be greatly appreciated.

I have a storm topology reading from a kafka spout (storm and storm-kafka 0.9.2). My problem is, when I configure the spout to read from the head of the stream (SpoutConfig startOffsetTime = -1), the spout sends tuples for 5 or 10 minutes, then stops. It emits a small burst of tuples once every 8 hours after that.
Another weird thing: when I configure the spout to read from the tail of the stream, 24 hrs ago (SpoutConfig startOffsetTime = -2), the spout sends tuples without stopping, although the timestamps of the tuples stay 24 hrs behind (the spout doesn’t fall behind or catch up).
Also, the total volume of tuples in either case is a lot lower than I expect.

Here’s a graph of the 1 minute rate of tuples hitting my first bolt (head of stream in blue, tail of stream in green). You can see the "head of stream” topology drying up after a few minutes.

I have spout metrics enabled, but I’m not sure what to look for. I attached some metrics (emitted after both topologies ran for an hour).

Thanks,
Catherine

Re: storm spout reading from head of stream dies after a few minutes

Posted by Lloyd Chang <ll...@gmail.com>.
Hi Catherine,

You wrote, "*I have spout metrics enabled, but I’m not sure what to look
for,*" and "*spout sends tuples for 5 or 10 minutes, then stops. It emits a
small burst of tuples once every 8 hours after that.*" In your attached
logs:
  head: *[__complete-latency = {}]*
  tail: *[__complete-latency = {default=63.05095541401274}]*

• Does your head stream spout eventually complete after *x* number of 8
hours cycles?
• How about obtaining more information via Storm UI REST API (
https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md)?
• And execute a `diff` of the metrics between your head stream and tail
stream spouts?
• Or `diff` additional logs in lieu of REST API?
• Conceptually, it sounds like Little's Law (
http://en.wikipedia.org/wiki/Little's_law) is in effect, and there's queue
of up 8 hours in your head stream spout.
• Specifically during Storm usage, here are tips from Nathan Marz via
https://groups.google.com/forum/#!msg/storm-user/Vsna6qVhH4E/IDpwQSlw0sMJ
that might help you: "*It really depends what your bottleneck is. First off
- 0.8.2 has a much more detailed Storm UI that provides useful metrics that
would help determine if your bottleneck is CPU. You should also look at the
CPU / network usage graphs on the machines to determine if they're
saturated. **Alternatively, your bottleneck might be on a database if
you're using one.*"

Cheers,
Lloyd

On Sat, Oct 4, 2014 at 5:09 PM, Elder, Catherine <Catherine.Elder@viasat.com
> wrote:

>  Whoops, mislabeled the colors in the graph. Corrected below.
>
>   From: <Elder>, Catherine Elder <Ca...@viasat.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: Saturday, October 4, 2014 at 5:06 PM
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: storm spout reading from head of stream dies after a few minutes
>
>   Hi everyone,
>
>  I have a kafka-storm problem where the kafka spout stops sending tuples
> after a few minutes. Any advice would be greatly appreciated.
>
>  I have a storm topology reading from a kafka spout (storm and
> storm-kafka 0.9.2). My problem is, when I configure the spout to read from
> the head of the stream (SpoutConfig startOffsetTime = -1), the spout sends
> tuples for 5 or 10 minutes, then stops. It emits a small burst of tuples
> once every 8 hours after that.
> Another weird thing: when I configure the spout to read from the tail of
> the stream, 24 hrs ago (SpoutConfig startOffsetTime = -2), the spout sends
> tuples without stopping, although the timestamps of the tuples stay 24 hrs
> behind (the spout doesn’t fall behind or catch up).
> Also, the total volume of tuples in either case is a lot lower than I
> expect.
>
>  Here’s a graph of the 1 minute rate of tuples hitting my first bolt
> (head of stream in green, tail of stream in blue). You can see the "head of
> stream” topology drying up after a few minutes.
>
>  I have spout metrics enabled, but I’m not sure what to look for. I
> attached some metrics (emitted after both topologies ran for an hour).
>
>  Thanks,
> Catherine
>

Re: storm spout reading from head of stream dies after a few minutes

Posted by "Elder, Catherine" <Ca...@viasat.com>.
Whoops, mislabeled the colors in the graph. Corrected below.

From: <Elder>, Catherine Elder <Ca...@viasat.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Saturday, October 4, 2014 at 5:06 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: storm spout reading from head of stream dies after a few minutes

Hi everyone,

I have a kafka-storm problem where the kafka spout stops sending tuples after a few minutes. Any advice would be greatly appreciated.

I have a storm topology reading from a kafka spout (storm and storm-kafka 0.9.2). My problem is, when I configure the spout to read from the head of the stream (SpoutConfig startOffsetTime = -1), the spout sends tuples for 5 or 10 minutes, then stops. It emits a small burst of tuples once every 8 hours after that.
Another weird thing: when I configure the spout to read from the tail of the stream, 24 hrs ago (SpoutConfig startOffsetTime = -2), the spout sends tuples without stopping, although the timestamps of the tuples stay 24 hrs behind (the spout doesn’t fall behind or catch up).
Also, the total volume of tuples in either case is a lot lower than I expect.

Here’s a graph of the 1 minute rate of tuples hitting my first bolt (head of stream in green, tail of stream in blue). You can see the "head of stream” topology drying up after a few minutes.

I have spout metrics enabled, but I’m not sure what to look for. I attached some metrics (emitted after both topologies ran for an hour).

Thanks,
Catherine