You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 20:50:35 UTC

[GitHub] [beam] damccorm opened a new issue, #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

damccorm opened a new issue, #20979:
URL: https://github.com/apache/beam/issues/20979

   To execute unbounded Splittable DoFn over fnapi in streaming mode properly, portable runners should issue split(ProcessBundleSplitRequest with fraction_of_remainder \> 0) or simply checkpoint(ProcessBundleSplitRequest with fraction_of_remainder \== 0) to SDK regularly to make current bundle finished processing instead of running forever.
   
   Imported from Jira [BEAM-11998](https://issues.apache.org/jira/browse/BEAM-11998). Original Jira may contain additional context.
   Reported by: boyuanz.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #20979:
URL: https://github.com/apache/beam/issues/20979#issuecomment-1452594981

   Another problem related to this issue is that running PeriodicSequence on Flink runner, the pipeline first runs for ~1 minute but then will fail with error
   ```
   Traceback (most recent call last):
     File ".../periodictest.py", line 82, in <module>
       test0(p)
     File ".../py38beam/lib/python3.8/site-packages/apache_beam/pipeline.py", line 601, in __exit__
       self.result.wait_until_finish()
     File "...py38beam/lib/python3.8/site-packages/apache_beam/runners/portability/portable_runner.py", line 614, in wait_until_finish
       raise self._runtime_exception
   RuntimeError: Pipeline ...-6ce3c5fde435 failed in state FAILED: java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 100.81.162.100:53640-99d457 timed out.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #20979:
URL: https://github.com/apache/beam/issues/20979#issuecomment-1246039291

   Since this is new functionality I think that P2 is the right level. This is still an important priority for portable runners to function properly with SDF.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #20979:
URL: https://github.com/apache/beam/issues/20979#issuecomment-1246039478

   CC @chamikaramj since tagged with xlang


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] LemonU commented on issue #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

Posted by GitBox <gi...@apache.org>.
LemonU commented on issue #20979:
URL: https://github.com/apache/beam/issues/20979#issuecomment-1219575981

   I have managed to pull off the workaround of adding `--experiments=use_deprecated_read` mentioned in the original Jira ticket.
   
   First, you'll need to start the Java expansion service on your own. If you are deploying the expansion service on docker, you can simply pull the flink job service image (`apache/beam_flink1.14_job_server:2.40.0` is what I used) and override the entrypoint with the following commands
   ```
   java -cp /opt/apache/beam/jars/* org.apache.beam.sdk.expansion.service.ExpansionService 8097  \
   --javaClassLookupAllowlistFile="*" \
   --defaultEnvironmentType=<your environment type here> \
   --defaultEnvironmentConfig=<your environment config here> \
   --experiments=use_deprecated_read
   ```
   
   Explanation on each of the flags:
   
   `-cp /opt/apache/beam/jars/*`: this is where the expansion service jars is located in the container
   
   `8097`: this specifies the port the expansion service should be opened on
   
   `--javaClassLookupAllowlistFile="*"`: this is so that all transforms registered under the expansion service can be requested for external expansion
   
   `--defaultEnvironmentType=<your environment type here>` and `--defaultEnvironmentConfig=<your environment config here>`: this specifies the `Environment` that the Java transforms you requested from this expansion service should be executed in. Be advised, your pipeline's environment configs will not affect this value, and the values set here for the expansion service will override that of your pipeline's. 
   That is, let's say you are running a python pipeline with `--environment_type=EXTERNAL --environment_config=localhost:50000` and the expansion service is started with `--defaultEnvironmentType=DOCKER`, and you are requesting the expansion for the kafka IO transforms from the expansion service, the resulting pipeline Protobuf payload will have all stages' environment being set to the `EXTERNAL` environment but the Kafka IO transforms that you requested from the expansion service, which will be set to the `DOCKER` environment.
   
   `--experiments=use_deprecated_read`: this is so that the legacy `Read` transform will replace the new SDF-based Kafka Read transform when the expansion service is expanding the kafka IO stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #20979: Portable runners should be able to issue checkpoints to Splittable DoFn

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #20979:
URL: https://github.com/apache/beam/issues/20979#issuecomment-1416112322

   This is coming back as we are pushing forward for the SDF implementations for various sources (kafka, generate sequence, etc) as well as en route to runner v2 of Dataflow runner. I have read the context of BEAM-11998 and would like to work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org