You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/06/25 03:22:06 UTC

[GitHub] [beam] kw2542 opened a new pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

kw2542 opened a new pull request #15081:
URL: https://github.com/apache/beam/pull/15081


   Add process support for ExternalWorkerService to support artifact staging in worker pool mode.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   `ValidatesRunner` compliance status (on master branch)
   --------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Lang</th>
         <th>ULR</th>
         <th>Dataflow</th>
         <th>Flink</th>
         <th>Samza</th>
         <th>Spark</th>
         <th>Twister2</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Go</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
           </a>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
       <tr>
         <td>Java</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/badge/icon?subject=V1+Streaming">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon?subject=V1+Java+11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/badge/icon?subject=V2+Streaming">
           </a><br>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon?subject=Java+8">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon?subject=Java+11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon?subject=Portable+Streaming">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/badge/icon?subject=Portable">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon?subject=Structured+Streaming">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon">
           </a>
         </td>
       </tr>
       <tr>
         <td>Python</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon?subject=ValCont">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
       <tr>
         <td>XLang</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   Examples testing status on various runners
   --------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Lang</th>
         <th>ULR</th>
         <th>Dataflow</th>
         <th>Flink</th>
         <th>Samza</th>
         <th>Spark</th>
         <th>Twister2</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Go</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>Java</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/badge/icon?subject=V1+Java11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
         </td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>Python</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>XLang</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   Post-Commit SDK/Transform Integration Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Go</th>
         <th>Java</th>
         <th>Python</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon?subject=3.6">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon?subject=3.7">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon?subject=3.8">
           </a>
         </td>
       </tr>
     </tbody>
   </table>
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>---</th>
         <th>Java</th>
         <th>Python</th>
         <th>Go</th>
         <th>Website</th>
         <th>Whitespace</th>
         <th>Typescript</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Non-portable</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon">
           </a><br>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon?subject=Tests">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon?subject=Lint">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon?subject=Docker">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon?subject=Docs">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
       </tr>
       <tr>
         <td>Portable</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-878521152


   Each Python SDK process instance is capable of running multiple work items
   in parallel already. The issue is that the Python GIL will limit it to use
   a single CPU core which is why multiple Python SDK process instances are
   launched. Whether they are launched by boot.go or someone else isn't too
   important.
   
   The prepare step sounds great for the external pool mode as well since that
   is what we want for docker for Apache Beam as well.
   
   
   On Mon, Jul 12, 2021 at 11:39 AM Ke Wu ***@***.***> wrote:
   
   > I am curious why artifact staging does not work with threads? I wonder if
   > we should fix that instead of introducing yet more complexity to this
   > already complex API.
   > In Python, I thought we used processes instead of threads because of the
   > GIL. But Java has no GIL, so I'm not sure there is an advantage to using
   > processes.
   >
   > Using threads still makes sense for IO bound tasks in Python since Python
   > can parallelize IO effectively. Python's GIL is problematic for CPU bound
   > tasks.
   >
   > @lukecwik <https://github.com/lukecwik> @ibzib <https://github.com/ibzib>
   > Correct me if I am wrong, my understanding here is that we use process mode
   > mainly because we can simplify the workflow by reusing the boot executable,
   > which can only be executed in a sub process instead of thread. In addition,
   > the boot executable starts the actual worker in a sub process too.
   >
   > It is true that we may implement a new workflow to support thread mode
   > instead of relying boot executable but it could be much more significant
   > work, let me know if you think it is worth the effort.
   >
   > In addition, I am wondering if we could add a prepare step in external
   > pool mode, then we may not need to run artifact staging for each start
   > worker request then. WDYT.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/15081#issuecomment-878505102>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ACM4V3DCDIPDOMUTYY4IWT3TXMZF7ANCNFSM47JB6KQA>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870098738


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-876835418


   > @kw2542 - What is the next step for this PR?
   
   Thanks for checking in, I am currently on a vacation, I will address the feedbacks once I am back next week. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-876819832


   @kw2542 - What is the next step for this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on a change in pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on a change in pull request #15081:
URL: https://github.com/apache/beam/pull/15081#discussion_r661870261



##########
File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##########
@@ -79,13 +79,16 @@
 
   private static final String dockerContainerImageOption = "docker_container_image";
   private static final String externalServiceAddressOption = "external_service_address";
+  private static final String externalServiceExecutableOption = "external_service_executable";

Review comment:
       This makes sense, the runner should be ignorant of how external worker executes. I do see this is the pattern that Python external worker follows. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-872386593


   > I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API.
   > 
   > In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes.
   
   Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on a change in pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #15081:
URL: https://github.com/apache/beam/pull/15081#discussion_r661857627



##########
File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##########
@@ -79,13 +79,16 @@
 
   private static final String dockerContainerImageOption = "docker_container_image";
   private static final String externalServiceAddressOption = "external_service_address";
+  private static final String externalServiceExecutableOption = "external_service_executable";

Review comment:
       External workers should be completely decoupled from the runner. If we introduce `external_service_executable` as a pipeline option, we have to add it to the contract between the runner and the worker.
   
   The configuration of the workers should be left to the worker pool wherever possible. So instead of a pipeline option, we'd add command line arguments in the worker pool's main method. WDYT?

##########
File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##########
@@ -79,13 +79,16 @@
 
   private static final String dockerContainerImageOption = "docker_container_image";
   private static final String externalServiceAddressOption = "external_service_address";
+  private static final String externalServiceExecutableOption = "external_service_executable";

Review comment:
       What is the executable expected to be? What benefit is there to allowing an arbitrary executable? We may want to simplify things for the user by making this a boolean (processes/threads) with a fixed executable otherwise.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870008069


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-878505102


   > > I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API.
   > > In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes.
   > 
   > Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks.
   
   @lukecwik @ibzib Correct me if I am wrong, my understanding here is that we use process mode mainly because we can simplify the workflow by reusing the boot executable, which can only be executed in a sub process instead of thread. In addition, the boot executable starts the actual worker in a sub process too.
   
   It is true that we may implement a new workflow to support thread mode instead of relying boot executable but it could be much more significant work, let me know if you think it is worth the effort.
   
   In addition, I am wondering if we could add a prepare step in external pool mode, then we may not need to run artifact staging for each start worker request then. WDYT.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870052377


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-879282133


   @lukecwik Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870121044


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 removed a comment on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 removed a comment on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-879281777


   > Each Python SDK process instance is capable of running multiple work items in parallel already. The issue is that the Python GIL will limit it to use a single CPU core which is why multiple Python SDK process instances are launched. Whether they are launched by boot.go or someone else isn't too important. The prepare step sounds great for the external pool mode as well since that is what we want for docker for Apache Beam as well.
   > […](#)
   > On Mon, Jul 12, 2021 at 11:39 AM Ke Wu ***@***.***> wrote: I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API. In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes. Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks. @lukecwik <https://github.com/lukecwik> @ibzib <https://github.com/ibzib> Correct me if I am wrong, my understanding here is that we use process mode mainly because we can simplify the workflow by reusing the boot executable, which can only be executed in a sub process instead of thread. In addition, the boot executable starts the actual worker in a sub process too. It is true that we may implement a new workflow to support thread mode instead of relyin
 g boot executable but it could be much more significant work, let me know if you think it is worth the effort. In addition, I am wondering if we could add a prepare step in external pool mode, then we may not need to run artifact staging for each start worker request then. WDYT. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <[#15081 (comment)](https://github.com/apache/beam/pull/15081#issuecomment-878505102)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACM4V3DCDIPDOMUTYY4IWT3TXMZF7ANCNFSM47JB6KQA> .
   
   Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-880078187


   > @lukecwik Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?
   
   Yes but not all Java/Go jobs would benefit from it so we would want it to be optional and only ones that have high cost start-up (lots of jars or user data).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-879281777


   > Each Python SDK process instance is capable of running multiple work items in parallel already. The issue is that the Python GIL will limit it to use a single CPU core which is why multiple Python SDK process instances are launched. Whether they are launched by boot.go or someone else isn't too important. The prepare step sounds great for the external pool mode as well since that is what we want for docker for Apache Beam as well.
   > […](#)
   > On Mon, Jul 12, 2021 at 11:39 AM Ke Wu ***@***.***> wrote: I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API. In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes. Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks. @lukecwik <https://github.com/lukecwik> @ibzib <https://github.com/ibzib> Correct me if I am wrong, my understanding here is that we use process mode mainly because we can simplify the workflow by reusing the boot executable, which can only be executed in a sub process instead of thread. In addition, the boot executable starts the actual worker in a sub process too. It is true that we may implement a new workflow to support thread mode instead of relyin
 g boot executable but it could be much more significant work, let me know if you think it is worth the effort. In addition, I am wondering if we could add a prepare step in external pool mode, then we may not need to run artifact staging for each start worker request then. WDYT. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <[#15081 (comment)](https://github.com/apache/beam/pull/15081#issuecomment-878505102)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACM4V3DCDIPDOMUTYY4IWT3TXMZF7ANCNFSM47JB6KQA> .
   
   Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on a change in pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on a change in pull request #15081:
URL: https://github.com/apache/beam/pull/15081#discussion_r661872721



##########
File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##########
@@ -79,13 +79,16 @@
 
   private static final String dockerContainerImageOption = "docker_container_image";
   private static final String externalServiceAddressOption = "external_service_address";
+  private static final String externalServiceExecutableOption = "external_service_executable";

Review comment:
       Good question, I had the same question for myself as well. 
   
   I ended up putting configurable executable because I noticed this is the pattern in the Python external worker pool support [1]. However, I did notice that in boot script [2] itself, it hard codes executable to be `/opt/apache/beam/boot` when `--worker_pool` is specified. 
   
   I suppose we may follow the same pattern as Python to make external worker pool itself configurable with executable but hard code `/opt/apache/beam/boot` in boot itself, alternatively we may just hard code in external worker pool itself. WDYT.
   
   [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/worker_pool_main.py#L64
   [2]  https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L91




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 closed pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 closed pull request #15081:
URL: https://github.com/apache/beam/pull/15081


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870164791


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870008069






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-870233907


   @ibzib Can you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kw2542 commented on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

Posted by GitBox <gi...@apache.org>.
kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-880176254


   > > @lukecwik Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?
   > 
   > Yes but not all Java/Go jobs would benefit from it so we would want it to be optional and only ones that have high cost start-up (lots of jars or user data).
   
   Sounds good, I will close this ticket/PR and create another one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org