You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Steve Niemitz <sn...@apache.org> on 2020/12/18 17:05:54 UTC

XLang pipelines in dataflow pull from docker.io by default?

I'm playing around with xlang portable pipelines in dataflow and noticed
that it tries to pull the java harness (beam_java8_sdk:2.25.0) from
docker.io.  This is problematic because our VPC prevents access to external
hosts.  I was able to fix the problem by passing in

--sdk_harness_container_image_overrides=.*java.*,
gcr.io/cloud-dataflow/v1beta3/beam_java8_sdk:2.25.0

to my job, but it's not ideal to have to do this.  Is there a reason the
default location is docker.io rather than gcr?  Especially given that
docker is going to be substantially limiting pulls / hour in the near
future.

Re: XLang pipelines in dataflow pull from docker.io by default?

Posted by Chamikara Jayalath <ch...@google.com>.
Yeah, currently containers other than for the pipeline SDK (for example,
Java SDK Harness container if you are using a Java transform from Python)
will be pulled from Docker. You can override using the option you mentioned.

We are working on copying all containers to gcr and using that but we are
not there yet.

Thanks,
Cham

On Fri, Dec 18, 2020 at 9:06 AM Steve Niemitz <sn...@apache.org> wrote:

> I'm playing around with xlang portable pipelines in dataflow and noticed
> that it tries to pull the java harness (beam_java8_sdk:2.25.0) from
> docker.io.  This is problematic because our VPC prevents access to
> external hosts.  I was able to fix the problem by passing in
>
> --sdk_harness_container_image_overrides=.*java.*,
> gcr.io/cloud-dataflow/v1beta3/beam_java8_sdk:2.25.0
>
> to my job, but it's not ideal to have to do this.  Is there a reason the
> default location is docker.io rather than gcr?  Especially given that
> docker is going to be substantially limiting pulls / hour in the near
> future.
>