You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Campbell <mi...@gmail.com> on 2014/06/11 19:47:20 UTC

Having trouble with streaming (updateStateByKey)

I'm having a little trouble getting an "updateStateByKey()" call to work;
was wondering if anyone could help.

In my chain of calls from getting Kafka messages out of the queue to
converting the message to a set of "things", then pulling out 2 attributes
of those things to a Tuple2, everything works.

So what I end up with is about a 1 second dump of things like this (this is
crufted up data, but it's basically 2 IPV6 addresses...)

-------------------------------------------
Time: 1402507839000 ms
-------------------------------------------
(::ffff:a14:b03,::ffff:a0a:2bd4)
(::ffff:a14:b03,::ffff:a0a:2bd4)
(::ffff:a0a:25a7,::ffff:a14:b03)
(::ffff:a14:b03,::ffff:a0a:2483)
(::ffff:a14:b03,::ffff:a0a:2480)
(::ffff:a0a:2d96,::ffff:a14:b03)
(::ffff:a0a:abb5,::ffff:a14:100)
(::ffff:a0a:dcd7,::ffff:a14:28)
(::ffff:a14:28,::ffff:a0a:da94)
(::ffff:a14:b03,::ffff:a0a:2def)
...


This works ok.

The problem is when I add a call to updateStateByKey - the streaming app
runs and runs and runs and never outputs anything.  When I debug, I can't
confirm that my state update passed-in function is ever actually getting
called.

Indeed I have breakpoints and println statements in my updateFunc and it
LOOKS like it's never getting called.  I can confirm that the
updateStateByKey function IS getting called (via it stopping on a
breakpoint).

Is there something obvious I'm missing?

Re: Having trouble with streaming (updateStateByKey)

Posted by Michael Campbell <mi...@gmail.com>.
I rearranged my code to do a reduceByKey which I think is working.  I also
don't think the problem was that updateState call, but something else;
unfortunately I changed a lot in looking for this issue, so not sure what
the actual fix might have been, but I think it's working now.


On Wed, Jun 11, 2014 at 1:47 PM, Michael Campbell <
michael.campbell@gmail.com> wrote:

> I'm having a little trouble getting an "updateStateByKey()" call to work;
> was wondering if anyone could help.
>
> In my chain of calls from getting Kafka messages out of the queue to
> converting the message to a set of "things", then pulling out 2 attributes
> of those things to a Tuple2, everything works.
>
> So what I end up with is about a 1 second dump of things like this (this
> is crufted up data, but it's basically 2 IPV6 addresses...)
>
> -------------------------------------------
> Time: 1402507839000 ms
> -------------------------------------------
> (::ffff:a14:b03,::ffff:a0a:2bd4)
> (::ffff:a14:b03,::ffff:a0a:2bd4)
> (::ffff:a0a:25a7,::ffff:a14:b03)
> (::ffff:a14:b03,::ffff:a0a:2483)
> (::ffff:a14:b03,::ffff:a0a:2480)
> (::ffff:a0a:2d96,::ffff:a14:b03)
> (::ffff:a0a:abb5,::ffff:a14:100)
> (::ffff:a0a:dcd7,::ffff:a14:28)
> (::ffff:a14:28,::ffff:a0a:da94)
> (::ffff:a14:b03,::ffff:a0a:2def)
> ...
>
>
> This works ok.
>
> The problem is when I add a call to updateStateByKey - the streaming app
> runs and runs and runs and never outputs anything.  When I debug, I can't
> confirm that my state update passed-in function is ever actually getting
> called.
>
> Indeed I have breakpoints and println statements in my updateFunc and it
> LOOKS like it's never getting called.  I can confirm that the
> updateStateByKey function IS getting called (via it stopping on a
> breakpoint).
>
> Is there something obvious I'm missing?
>