You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Terry Dhariwal <dh...@pythian.com> on 2018/06/15 09:13:50 UTC
Apache Beam/Dataflow - using custom message id and timestamp
Hello beam team,
I'm reading from a Kafka topic and writing to Pubsub. The code constructs
PubsubMessage objects from the Kafka messages
I'm setting some attributes for the PubsubMessage objects, namely message_id
and message_timestamp2
I wish to use these attribute values downstream, so that subscribing
dataflow jobs use the message_id for exactly once processing and the
message_timestamp2 for windowing. Therefore, when I write to Pubsub, I do
this
PubsubIO.writeMessages()
.withIdAttribute("message_id")
.withTimestampAttribute("message_timestamp2").to(options.getOutputTopic()));
However, instead of setting the values downstream with these attribute
values, dataflow is updating them attributes with the default message_id
and the process time.
What am I doing wrong?
Pythian
love your data
*Terry Dhariwal* | Solutions Architect
*t *+44 (0) 7460 877 412
*e *dhariwal@pythian.com
*w* www.pythian.com
--
--