You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "lostluck (via GitHub)" <gi...@apache.org> on 2023/04/18 22:52:51 UTC

[GitHub] [beam] lostluck opened a new issue, #26338: [Bug][Java]: SDF Wrapped Bounded sources don't implement progress tracking leading to poor performance.

lostluck opened a new issue, #26338:
URL: https://github.com/apache/beam/issues/26338

   ### What happened?
   
   While investigating performance of Java pipelines on Dataflow RunnerV2, we've determined that SDF wrapped legacy Bounded Sources perform and scale poorly compared to the legacy runner v1 worker that interacts with the sources directly.
   
   There are 2 things to resolve in this issue.
   
   -----
   
   First:
   
   The [BoundedSourceAsSDFRestrictionTracker](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L332) does not implement the (optional) [HasProgress interface](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java#L144), which per [RestrictionTracker documentation](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L332) can lead to poor auto-scaling and splitting performance.
   
   The [Unbounded counterpart](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L804) *does* implement [getProgress ](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L968) which may possibly be used as a basis for the bounded implementation.
   
   
   -----
   
   Second:
   
   The SplittableFnDataReceiver has different splitting characteristics vs normal DoFns:
   
   When getting progress for splits of SDFs and there's no HasProgress method, then the [receiver returns 0](https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java#L1135), note however that in some anonymous implementations this may be overridden (see elsewhere in FnAPIDoFnRunner).
   
   This will override default behavior that's used for non SDFs, which is to return progress of 0.5, [when determining a split](https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L262)
   
   That is, estimated split progress is different between these two cases:
   for NonSDF => 0.5 ; for SDF without getProgress => 0.
   
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #26338: [Bug][Java]: SDF Wrapped Bounded sources don't implement progress tracking leading to poor performance.

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #26338:
URL: https://github.com/apache/beam/issues/26338#issuecomment-1513883532

   cc: @kennknowles @robertwb @malo-denielou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj closed issue #26338: [Bug][Java]: SDF Wrapped Bounded sources don't implement progress tracking leading to poor performance.

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj closed issue #26338: [Bug][Java]: SDF Wrapped Bounded sources don't implement progress tracking leading to poor performance.
URL: https://github.com/apache/beam/issues/26338


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org