You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 18:03:48 UTC

[GitHub] [beam] damccorm opened a new issue, #20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms

damccorm opened a new issue, #20530:
URL: https://github.com/apache/beam/issues/20530

   All runners seem to be capable of migrating to splittable DoFn for non-portable execution except for Dataflow runner v1 which will internalize the current primitive read implementation that is shared across runner implementations.
   
   Imported from Jira [BEAM-10670](https://issues.apache.org/jira/browse/BEAM-10670). Original Jira may contain additional context.
   Reported by: lcwik.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #20530:
URL: https://github.com/apache/beam/issues/20530#issuecomment-1736280975

   I don't think anyone is actively pursuing this goal at the moment. I think that the portable FlinkRunner is the one that has splittable DoFn support. They are pretty independent runners, I believe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make non-portable Splittable DoFn the only option when executing Java "Read" transforms [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #20530:
URL: https://github.com/apache/beam/issues/20530#issuecomment-1877702172

   The current (bad) status is that all non-Dataflow runners will use legacy read _if_ the runner is set up prior to expansion. This results in non-portable expansion behaviors.
   
   The desired status would be that runners override the SDF read to legacy read if desired. The code to do this is already shipped with KafkaIO and used in the Dataflow runner, but it would be some real work, and probably throwaway work, to adjust other runners to use the override. More likely we just push everything to SDF.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aditiwari01 commented on issue #20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms

Posted by "aditiwari01 (via GitHub)" <gi...@apache.org>.
aditiwari01 commented on issue #20530:
URL: https://github.com/apache/beam/issues/20530#issuecomment-1504760474

   Hi @damccorm 
   
   I was trying KafkaIO with FlinkRunner but facing following issue:
   
   ```
   Exception in thread "main" java.lang.IllegalStateException: No translator known for org.apache.beam.runners.core.construction.SplittableParDo$PrimitiveUnboundedRead
   	at org.apache.beam.runners.core.construction.PTransformTranslation.urnForTransform(PTransformTranslation.java:283)
   	at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:135)
   	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:593)
   	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
   	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
   	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
   	at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$500(TransformHierarchy.java:240)
   	at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:214)
   	at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:469)
   	at org.apache.beam.runners.flink.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:38)
   	at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.translate(FlinkStreamingPipelineTranslator.java:92)
   	at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:115)
   	at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:105)
   	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323)
   	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)
   	at BeamPipelineKafka.main(BeamPipelineKafka.java:54)
   ```
   
   As you mentioned all the runners are capable of Splittable DoFn, is there anything I am missing?
   
   I have also tried with `"--experiments=use_deprecated_read"` to use primitive read but still facing same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles closed issue #20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles closed issue #20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms
URL: https://github.com/apache/beam/issues/20530


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org