You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/04/19 23:34:00 UTC

[jira] [Work logged] (BEAM-14332) Improve the workflow of cluster management for Flink on Dataproc

     [ https://issues.apache.org/jira/browse/BEAM-14332?focusedWorklogId=758880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758880 ]

ASF GitHub Bot logged work on BEAM-14332:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Apr/22 23:33
            Start Date: 19/Apr/22 23:33
    Worklog Time Spent: 10m 
      Work Description: KevinGG opened a new pull request, #17402:
URL: https://github.com/apache/beam/pull/17402

   1. Used ClusterMetadata to replace MasterURLIdentifier. Instead of
      MasterURLIdentifier, now Union[str, beam.Pipeline, ClusterMetadata]
      functions as ClusterIdentifier.
   2. Made DataprocClusterManager 1:1 with real clusters instead of 1:1
      with pipelines.
   3. Deprecated bidict and many unnecessary mappings in Clusters class.
   4. Added a create factory in Clusters class to create cluster managers.
   5. Used default cluster metadata to replace the default cluster name.
      Now each cluster created has a distinct cluster name.
   6. Applied mocks to isolate singleton environment/clusters instance in
      tests to avoid test failures due to shared states.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 758880)
    Remaining Estimate: 0h
            Time Spent: 10m

> Improve the workflow of cluster management for Flink on Dataproc
> ----------------------------------------------------------------
>
>                 Key: BEAM-14332
>                 URL: https://issues.apache.org/jira/browse/BEAM-14332
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-py-interactive
>            Reporter: Ning
>            Assignee: Ning
>            Priority: P2
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Improve the workflow of cluster management.
> There is an option to configure a default [cluster name|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L366]. The existing user flows are:
>  # Use the default cluster name to create a new cluster if none is in use;
>  # Reuse a created cluster that has the default cluster name;
>  # If the default cluster name is configured to a new value, re-apply 1 and 2.
>  A better solution is to 
>  # Create a new cluster implicitly if there is none or explicitly if the user wants one with specific provisioning;
>  # Always default to using the last created cluster.
>  The reasons are:
>  * Cluster name is meaningless to the user when a cluster is just a medium to run OSS runners (as applications) such as Flink or Spark. The cluster could also be running anywhere (on GCP) such as Dataproc, k8s, or even Dataflow itself.
>  * Clusters should be uniquely identified, thus should always have a distinct name. Clusters are managed (created/reused/deleted) behind the scenes by the notebook runtime when the user doesn’t explicitly do so (the capability to explicitly manage clusters is still available). Reusing the same default cluster name is risky when a cluster is deleted by one notebook runtime while another cluster with the same name is created by a different notebook runtime. 
>  * Provide the capability for the user to explicitly provision a cluster.
> Current implementation provisions each cluster at the location specified by GoogleCloudOptions using 3 worker nodes. There is no explicit API to configure the number or shape of workers.
> We could use the WorkerOptions to allow customers to explicitly provision a cluster and expose an explicit API (with UX in notebook extension) for customers to change the size of a cluster connected with their notebook (until we have an auto scaling solution with Dataproc for Flink).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)