You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/04 23:52:46 UTC

[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that failed to parse).
       - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org