You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Terry Dhariwal <dh...@pythian.com> on 2018/06/15 09:13:50 UTC

Apache Beam/Dataflow - using custom message id and timestamp

Hello beam team,

I'm reading from a Kafka topic and writing to Pubsub. The code constructs
PubsubMessage objects from the Kafka messages

I'm setting some attributes for the PubsubMessage objects, namely message_id
 and message_timestamp2

I wish to use these attribute values downstream, so that subscribing
dataflow jobs use the message_id for exactly once processing and the
message_timestamp2 for windowing. Therefore, when I write to Pubsub, I do
this

PubsubIO.writeMessages()
        .withIdAttribute("message_id")
        .withTimestampAttribute("message_timestamp2").to(options.getOutputTopic()));

However, instead of setting the values downstream with these attribute
values, dataflow is updating them attributes with the default message_id
and the process time.

What am I doing wrong?

Pythian

          love your data


*Terry Dhariwal* | Solutions Architect
*t  *+44 (0) 7460 877 412
*e  *dhariwal@pythian.com
*w* www.pythian.com

-- 


--