You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Tommy Becker <to...@tivo.com> on 2015/07/23 22:23:27 UTC

Access to high-watermark from within a Samza job

I'm writing a Samza job that basically serves to pump data out of Kafka into another system.  For my particular use-case, I want to essentially process the entire topic as it exists when the job starts and then exit.  As far as I can tell, there doesn't seem to be a way to do that right now because it is impossible for the job to determine the high-watermark of the topics it's processing.  I found this issue that mentions adding a getHighWatermark() to IncomingMessageEnvelope:

https://issues.apache.org/jira/browse/SAMZA-539

The use-case discussed there seems to be metrics but this API would enable mine as well.  This seems pretty trivial to add, is there some reason it hasn't been done yet?  Otherwise I can take a stab at it.  Or is there another way to do what I need that I'm unaware of?

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Access to high-watermark from within a Samza job

Posted by Yan Fang <ya...@gmail.com>.
Hi Tommy,

It has not been implemented just because no one is working on it, not other
reasons. :) If you want to take a stab, feel free to do this. That will be
great.

(copycat use case? :)

Cheers,

Fang, Yan
yanfang724@gmail.com

On Thu, Jul 23, 2015 at 1:23 PM, Tommy Becker <to...@tivo.com> wrote:

> I'm writing a Samza job that basically serves to pump data out of Kafka
> into another system.  For my particular use-case, I want to essentially
> process the entire topic as it exists when the job starts and then exit.
> As far as I can tell, there doesn't seem to be a way to do that right now
> because it is impossible for the job to determine the high-watermark of the
> topics it's processing.  I found this issue that mentions adding a
> getHighWatermark() to IncomingMessageEnvelope:
>
> https://issues.apache.org/jira/browse/SAMZA-539
>
> The use-case discussed there seems to be metrics but this API would enable
> mine as well.  This seems pretty trivial to add, is there some reason it
> hasn't been done yet?  Otherwise I can take a stab at it.  Or is there
> another way to do what I need that I'm unaware of?
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com>
> tobecker@tivo.com<ma...@tivo.com>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>