You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Minreng Wu <wu...@gmail.com> on 2020/08/24 18:05:58 UTC

How to integrate Beam SQL windowing query with KafkaIO?

Hi contributors,

Sorry to bother you! I met a problem when I was trying to apply a windowing
aggregation Beam SQL query to a Kafka input source.

The details of the question are in the following link:
https://stackoverflow.com/questions/63566057/how-to-integrate-beam-sql-windowing-query-with-kafkaio.
And the version of the Beam Java SDK I used is *2.23.0*

Really appreciate your help and advice! Stay safe and happy!

Thanks and regards,
Minreng

Re: How to integrate Beam SQL windowing query with KafkaIO?

Posted by Rui Wang <ru...@google.com>.
Glad it has worked!  So sounds like data has been dropped as they are
considered late data and `.withAllowedLateness()` make the data emitted.


-Rui

On Thu, Aug 27, 2020 at 10:09 AM Minreng Wu <wu...@gmail.com> wrote:

> Hi Rui,
>
> Thanks for your advice!
>
> After reading Chapter 2&3 of *Streaming Systems* and some other
> materials, eventually I make it work! It indeed turned out to be an issue
> of not setting the trigger correctly. Previously, I didn't set the trigger
> & watermark so it would use the default settings. After I added
> `.withAllowedLateness()`, it can correctly materialize the window output as
> expected. Thank you so much for your help!
>
> Thanks & Regards,
> Minreng
>
>
> On Mon, Aug 24, 2020 at 1:58 PM Rui Wang <ru...@google.com> wrote:
>
>> Hi,
>>
>> I checked the query in your SO question and I think the SQL usage is
>> correct.
>>
>> My current guess is that the problem is how does watermark generate and
>> advance in KafkaIO. It could be either the watermark didn't pass the end of
>> your SQL window for aggregation or the data was lagging behind the
>> watermark so they are considered late data.
>>
>> One way to verify it is you can try to use TestStream as the source to
>> evaluate your pipeline and see whether it works well.
>>
>> -Rui
>>
>> On Mon, Aug 24, 2020 at 11:06 AM Minreng Wu <wu...@gmail.com> wrote:
>>
>>> Hi contributors,
>>>
>>> Sorry to bother you! I met a problem when I was trying to apply a
>>> windowing aggregation Beam SQL query to a Kafka input source.
>>>
>>> The details of the question are in the following link:
>>> https://stackoverflow.com/questions/63566057/how-to-integrate-beam-sql-windowing-query-with-kafkaio.
>>> And the version of the Beam Java SDK I used is *2.23.0*
>>>
>>> Really appreciate your help and advice! Stay safe and happy!
>>>
>>> Thanks and regards,
>>> Minreng
>>>
>>

Re: How to integrate Beam SQL windowing query with KafkaIO?

Posted by Minreng Wu <wu...@gmail.com>.
Hi Rui,

Thanks for your advice!

After reading Chapter 2&3 of *Streaming Systems* and some other materials,
eventually I make it work! It indeed turned out to be an issue of not
setting the trigger correctly. Previously, I didn't set the trigger &
watermark so it would use the default settings. After I added
`.withAllowedLateness()`, it can correctly materialize the window output as
expected. Thank you so much for your help!

Thanks & Regards,
Minreng


On Mon, Aug 24, 2020 at 1:58 PM Rui Wang <ru...@google.com> wrote:

> Hi,
>
> I checked the query in your SO question and I think the SQL usage is
> correct.
>
> My current guess is that the problem is how does watermark generate and
> advance in KafkaIO. It could be either the watermark didn't pass the end of
> your SQL window for aggregation or the data was lagging behind the
> watermark so they are considered late data.
>
> One way to verify it is you can try to use TestStream as the source to
> evaluate your pipeline and see whether it works well.
>
> -Rui
>
> On Mon, Aug 24, 2020 at 11:06 AM Minreng Wu <wu...@gmail.com> wrote:
>
>> Hi contributors,
>>
>> Sorry to bother you! I met a problem when I was trying to apply a
>> windowing aggregation Beam SQL query to a Kafka input source.
>>
>> The details of the question are in the following link:
>> https://stackoverflow.com/questions/63566057/how-to-integrate-beam-sql-windowing-query-with-kafkaio.
>> And the version of the Beam Java SDK I used is *2.23.0*
>>
>> Really appreciate your help and advice! Stay safe and happy!
>>
>> Thanks and regards,
>> Minreng
>>
>

Re: How to integrate Beam SQL windowing query with KafkaIO?

Posted by Rui Wang <ru...@google.com>.
Hi,

I checked the query in your SO question and I think the SQL usage is
correct.

My current guess is that the problem is how does watermark generate and
advance in KafkaIO. It could be either the watermark didn't pass the end of
your SQL window for aggregation or the data was lagging behind the
watermark so they are considered late data.

One way to verify it is you can try to use TestStream as the source to
evaluate your pipeline and see whether it works well.

-Rui

On Mon, Aug 24, 2020 at 11:06 AM Minreng Wu <wu...@gmail.com> wrote:

> Hi contributors,
>
> Sorry to bother you! I met a problem when I was trying to apply a
> windowing aggregation Beam SQL query to a Kafka input source.
>
> The details of the question are in the following link:
> https://stackoverflow.com/questions/63566057/how-to-integrate-beam-sql-windowing-query-with-kafkaio.
> And the version of the Beam Java SDK I used is *2.23.0*
>
> Really appreciate your help and advice! Stay safe and happy!
>
> Thanks and regards,
> Minreng
>