You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "robertwb (via GitHub)" <gi...@apache.org> on 2023/08/14 21:08:58 UTC

[GitHub] [beam] robertwb commented on pull request #27842: [#27839] Write PipelineOptions to a file instead of an environment variable.

robertwb commented on PR #27842:
URL: https://github.com/apache/beam/pull/27842#issuecomment-1678065266

   > > The change looks reasonable, but I'm curious why it's needed. Do environment variables have line length limits similar to command lines?
   > 
   > Per the issue (#27839), as far as the linux kernel is concerned, Environment Variables are Command Line Arguments. They consume the same "resource". See this stack overflow about it: https://stackoverflow.com/questions/28865473/setting-environment-variable-to-a-large-value-argument-list-too-long
   > 
   > As a result, the only way to avoid Argument Too Long errors from the OS when starting a worker is to _not_ serialize the whole pipeline options into the command line too.
   > 
   > This has affected Dataflow customers on RunnerV2 which use the portable SDK containers, when the `filesToStage` value becomes excessively long due to Java reasons.
   > 
   > The Legacy Java Dataflow Worker (runnerv1) has done this since 2018, and the fix was never backported to the containers, since we mistakenly believed that Environment Variables would get around the problem.
   > 
   > They don't.
   
   Interesting. Thanks for digging into this. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org