You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by ZAFFALON Mattia - NTTDATA <ma...@skytv.it> on 2021/06/17 14:45:31 UTC

CometD IO Connector for Beam

Hi,

I'm reaching out to ask you a couple of questions regarding the existence, and eventually the state, of a connector for CometD (https://docs.cometd.org/), especially when used as publish/subscribe mechanism.

You may have already heard about CometD, but, just to provide some context to this email, it is the way Salesforce delivers change data capture events to other systems. Here is the documentation from SF developer portal https://developer.salesforce.com/docs/atlas.en-us.218.0.change_data_capture.meta/change_data_capture/cdc_intro.htm, and this in particular is the section of CometD documentation that describe the architecture of the publish/subscribe interaction that can be used with CometD: https://docs.cometd.org/current7/reference/#_concepts_channels_broadcast.

There is a pretty straightforward example of client (from Salesforce, maintained by a community) in this repo: https://github.com/forcedotcom/EMP-Connector.

Given that the job I'm going to build seems to fit with the Beam/Dataflow streaming job model, I'd like to ask:

  *   It looks like there is nothing around the web and github regarding Beam CometD connectors, do you know if there is anything in progress?
  *   Since I'm planning to build (at least the Source part of) the connector, could you please give me some advice about using SplittableDoFns (as it kind of suggested in Beam documentation), or an UnboundeSource? It seems to me that for each subscription to one CometD channel there can be at most one consumer, hence I don't know if a splittable do function can be used here. On the other hand, the UnboundedSource with just one split seems to me adapting much better.

Would you please share your ideas?

Best regards,
Mattia