You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Edmondo Porcu <ed...@gmail.com> on 2018/05/22 10:12:46 UTC

How does KStream transform performs repartitioning?

Hello users,

we are performing a Transform so that out of a larger message we emit a new
output record only if that specific field has changed.


Since we introduced that to reduce the number of output records, our final
Kstream - KStream windowed join is not ticking anymore, although the window
is large enough (1year). Our both streams have 3 partitions.

We noticed that in the KStreams documentation, the following sentence
appears:

     * Transforming records might result in an internal data redistribution
if a key based operator (like an aggregation
     * or join) is applied to the result {@code KStream}.
     * (cf. {@link #transformValues(ValueTransformerSupplier, String...)})

What does this redistribution mean and how does it work? Why have we lost
joins result?

Thank you

Re: How does KStream transform performs repartitioning?

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Edmondo,

If you have a join operator following the transform() operator, then the
joining streams will be sent to a repartition topic, and the join
operator's hosted thread will then read from that repartition topic. This
is for "re-shuffling" the streams since the key of the stream record may
have changed after transform. It should not cause joining to not tick. But
I'd suggest you check your repartition topic and see if it is correctly
created and populated (note that if you add such an operator into your app,
your topology may have been changed largely, which you can check by calling
Topology#describe() to see the difference, such that you may not be able to
do a in-place upgrade any more, but have to restart it after using the
streams reset tool).


Guozhang


On Tue, May 22, 2018 at 3:12 AM, Edmondo Porcu <ed...@gmail.com>
wrote:

> Hello users,
>
> we are performing a Transform so that out of a larger message we emit a new
> output record only if that specific field has changed.
>
>
> Since we introduced that to reduce the number of output records, our final
> Kstream - KStream windowed join is not ticking anymore, although the window
> is large enough (1year). Our both streams have 3 partitions.
>
> We noticed that in the KStreams documentation, the following sentence
> appears:
>
>      * Transforming records might result in an internal data redistribution
> if a key based operator (like an aggregation
>      * or join) is applied to the result {@code KStream}.
>      * (cf. {@link #transformValues(ValueTransformerSupplier, String...)})
>
> What does this redistribution mean and how does it work? Why have we lost
> joins result?
>
> Thank you
>



-- 
-- Guozhang