You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by rick bolkey <ri...@bolkey.org> on 2016/01/03 04:00:31 UTC

Batch processing stream-stream joins

Hi all,

I'm looking for advice in how to set up a samza job that does a
stream-stream join in a way that the code can be re-used in both streaming
and batch (by re-hydrating our kafka queue with historical data).

It seems we need a way to inform our job that there is "no more data" via a
end of stream message. It also seems like any windowing aggregation that
assumes there is no more data while streaming would need to be disabled. I
wasn't able to find much discussion on the topic, so looking for some
pointers.

Thanks
Rick

Re: Batch processing stream-stream joins

Posted by Yi Pan <ni...@gmail.com>.

Hi, Rick,

Please refer to the whole discussion on SAMZA-552, which exactly targets
the issues that you are considering. We have been working on the design and
proto-type since last year. The work was paused for the last few months due
to other priorities. We are planning to resume the work this Q1. Feel free
to read and comment on SAMZA-552.

Thanks!

-Yi

On Sat, Jan 2, 2016 at 7:00 PM, rick bolkey <ri...@bolkey.org> wrote:

> Hi all,
>
> I'm looking for advice in how to set up a samza job that does a
> stream-stream join in a way that the code can be re-used in both streaming
> and batch (by re-hydrating our kafka queue with historical data).
>
> It seems we need a way to inform our job that there is "no more data" via a
> end of stream message. It also seems like any windowing aggregation that
> assumes there is no more data while streaming would need to be disabled. I
> wasn't able to find much discussion on the topic, so looking for some
> pointers.
>
> Thanks
> Rick
>