You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Miles Edwards (Jira)" <ji...@apache.org> on 2019/09/11 09:38:00 UTC

[jira] [Updated] (BEAM-8189) DataflowRunner fails when using a Shared VPC from another project

     [ https://issues.apache.org/jira/browse/BEAM-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miles Edwards updated BEAM-8189:
--------------------------------
    Summary: DataflowRunner fails when using a Shared VPC from another project  (was: DataflowRunner does not work with Shared VPC in another Project)

> DataflowRunner fails when using a Shared VPC from another project
> -----------------------------------------------------------------
>
>                 Key: BEAM-8189
>                 URL: https://issues.apache.org/jira/browse/BEAM-8189
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.15.0
>            Reporter: Miles Edwards
>            Priority: Major
>
> h1. The Setup:
> I have two Projects on the Google Cloud Platform
> 1) Service Project for my Dataflow jobs
> 2) Host Project for Shared VPC & Subnetworks
> The Host Project has configured Firewall Rules for the Dataflow job. ie. allow all traffic, allow all internal traffic, allow all traffic tagged with 'dataflow' etc
>  
> h1. The Args
> {code:java}
> --project <host project name>
> --network <shared vpc project name>
> --subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc project name>/regions/<region job is running in service project>/subnetworks/<name of subnetwork in shared vpc project>"
> --service_account_email=<service account with Compute Network User permission for both projects, shared vpc network & subnetwork>
> {code}
> h1. The Problem
> The job will hang on shuffles when set to run within the service project, but use the host project network. I will also see the following warning:
> {code:java}
> The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: No firewall rules associated with your network.
> {code}
>  
> h1. What I've Tried
> As mentioned in my [StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]] , I've tried the following:
> 1. Only passing "subnetwork" arg without "network" but that only modifies the warning to state "default" instead of "miles-qa-vpc", which sounds like a logging error to me.
> 2. Firewall rules have been configured to:
>  - allow all traffic
>  - allow all internal traffic
>  - allow all traffic with the source tag 'dataflow'
>  - allow all traffic with the target tag 'dataflow'
> 3. Service Account has been configured to have Compute Network User permissions in both projects.
> 4. Ensured subnetwork is in the same region as the job.
> 5. Network in the service project is happily serving a dedicated cluster for other purposes in the host project.
> It genuinely seems like the spawned Compute Instances are not gaining the configuration.
> I expect the Dataflow job not to report the firewall issue and successfully deal with shuffling (GroupBys etc.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)