You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/27 17:08:57 UTC

[GitHub] [beam] lukecwik commented on pull request #12617: [BEAM-10670] Update Samza to be opt-out for SplittableDoFn powering the Read transform.

lukecwik commented on pull request #12617:
URL: https://github.com/apache/beam/pull/12617#issuecomment-682077852


   > > @lukecwik : Ke from samza side will help take a look. Thanks!
   > 
   > @kw2542 If we want to support unbounded splittable DoFns using the non-portable execution then we'll need to support [GBKIntoKeyedWorkItem](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java#L79).
   > 
   > I see that there is [KvToKeyedWorkItemOp](https://github.com/apache/beam/blob/master/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KvToKeyedWorkItemOp.java) but it doesn't output any timers that need to fire which is something that the underlying [splittable dofn implementation](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L235) is relying on. The timer firing seems to be done by [GroupByKeyOp](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java#L225).
   > 
   > Is this something you can help me with? (feel free to open PRs against [my repo](https://github.com/lukecwik/incubator-beam/tree/beam10670.3) and or provide suggestions on this PR)
   
   I worked through the translation logic and was able to get unbounded splittable dofn tests to pass. The things that don't work are:
   * side inputs for unbounded splittable dofns
   * bundle finalization (was already unsupported) and the current UnboundedSourceSystem doesn't support finalizing checkpoints
   
   It also looks like I can't test unbounded splittable dofns in the global window since PAssert doesn't seem to work for Samza in an unbounded pipeline running in the global window. I can manually see that output is being produced via a log statement in the output manager.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org