You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chad Dombrova <ch...@gmail.com> on 2019/08/08 17:51:48 UTC

Allowing firewalled/offline builds of Beam

This topic came up in another thread, so I wanted to highlight a few things
that we've discovered in our endeavors to build Beam behind a firewall.

Conceptually, in order to allow this, a user needs to provide alternate
mirrors for each "artifact" service required during build, and luckily I
think most of the toolchains used by Beam support this. For example, the
default PyPI mirror used by pip can be overridden via env var to an
internal mirror, and likewise for docker and its registry service.  I'm
currently looking into gogradle to see if we can provide an alternate
vendor directory as a shared resource behind our firewall. (I have a bigger
question here, which is why was it necessary to add a third language into
the python Beam ecosystem, just for the bootstrap process?  Couldn't the
boot code use python, or java?)

But I'm getting ahead of myself.  We're actually stuck at the very
beginning, with gradlew.  The gradlew wrapper seems to unconditionally
download gradle, so you can't get past the first few hundred lines of code
in the build process without requiring internet access.  I made a ticket
here: https://issues.apache.org/jira/browse/BEAM-7931.  I'd love some
pointers on how to fix this, because the offending code lives inside
gradle-wrapper.jar, so I can't change it without access to the source.

thanks,
-chad

Re: Allowing firewalled/offline builds of Beam

Posted by Robert Burke <ro...@frantil.com>.
If the work to switch to using Go Modules under gogradle works, then it
should be possible to use a proxy hosted inside the firewall for the go
packages, rather than the vendoring directories.

On Thu, Aug 8, 2019, 11:17 AM Lukasz Cwik <lc...@google.com> wrote:

> Udi beat me by a couple of mins.
>
> We build a good portion of the Beam Java codebase internally within Google
> by bypassing the gradle wrapper (gradlew) and executing the gradle command
> from a full gradle installation at the root of a copy of the Beam codebase.
>
> It does require your internal build system to use a version of gradle that
> is compatible with the version[1] that gradlew uses and you could create a
> wrapper that figures out which version of gradle to use and select the
> appropriate one from many local gradle installations. This should allow you
> to bypass the gradlew script entirely and any downloading it does.
>
> Note that gradle does support a --offline flag which we also use to ensure
> that it doesn't pull stuff from the internet. Not sure if all the plugins
> honor it but it works well enough for us to build most of the Beam Java
> codebase with it.
>
> 1:
> https://github.com/apache/beam/blob/497bc77c0d53098887156a014a659184097ef021/gradle/wrapper/gradle-wrapper.properties#L20
>
> On Thu, Aug 8, 2019 at 11:15 AM Udi Meiri <eh...@google.com> wrote:
>
>> You can download it here: https://gradle.org/releases/
>> and run it instead of using the wrapper.
>>
>> Example:
>> $ cd
>> $ unzip Downloads/gradle-5.5.1-bin.zip
>> $ cd ~/src/beam
>> $ ~/gradle-5.5.1/bin/gradle lint
>>
>>
>> On Thu, Aug 8, 2019 at 10:52 AM Chad Dombrova <ch...@gmail.com> wrote:
>>
>>> This topic came up in another thread, so I wanted to highlight a few
>>> things that we've discovered in our endeavors to build Beam behind a
>>> firewall.
>>>
>>> Conceptually, in order to allow this, a user needs to provide alternate
>>> mirrors for each "artifact" service required during build, and luckily I
>>> think most of the toolchains used by Beam support this. For example, the
>>> default PyPI mirror used by pip can be overridden via env var to an
>>> internal mirror, and likewise for docker and its registry service.  I'm
>>> currently looking into gogradle to see if we can provide an alternate
>>> vendor directory as a shared resource behind our firewall. (I have a bigger
>>> question here, which is why was it necessary to add a third language into
>>> the python Beam ecosystem, just for the bootstrap process?  Couldn't the
>>> boot code use python, or java?)
>>>
>>> But I'm getting ahead of myself.  We're actually stuck at the very
>>> beginning, with gradlew.  The gradlew wrapper seems to unconditionally
>>> download gradle, so you can't get past the first few hundred lines of code
>>> in the build process without requiring internet access.  I made a ticket
>>> here: https://issues.apache.org/jira/browse/BEAM-7931.  I'd love some
>>> pointers on how to fix this, because the offending code lives inside
>>> gradle-wrapper.jar, so I can't change it without access to the source.
>>>
>>> thanks,
>>> -chad
>>>
>>>

Re: Allowing firewalled/offline builds of Beam

Posted by Lukasz Cwik <lc...@google.com>.
Udi beat me by a couple of mins.

We build a good portion of the Beam Java codebase internally within Google
by bypassing the gradle wrapper (gradlew) and executing the gradle command
from a full gradle installation at the root of a copy of the Beam codebase.

It does require your internal build system to use a version of gradle that
is compatible with the version[1] that gradlew uses and you could create a
wrapper that figures out which version of gradle to use and select the
appropriate one from many local gradle installations. This should allow you
to bypass the gradlew script entirely and any downloading it does.

Note that gradle does support a --offline flag which we also use to ensure
that it doesn't pull stuff from the internet. Not sure if all the plugins
honor it but it works well enough for us to build most of the Beam Java
codebase with it.

1:
https://github.com/apache/beam/blob/497bc77c0d53098887156a014a659184097ef021/gradle/wrapper/gradle-wrapper.properties#L20

On Thu, Aug 8, 2019 at 11:15 AM Udi Meiri <eh...@google.com> wrote:

> You can download it here: https://gradle.org/releases/
> and run it instead of using the wrapper.
>
> Example:
> $ cd
> $ unzip Downloads/gradle-5.5.1-bin.zip
> $ cd ~/src/beam
> $ ~/gradle-5.5.1/bin/gradle lint
>
>
> On Thu, Aug 8, 2019 at 10:52 AM Chad Dombrova <ch...@gmail.com> wrote:
>
>> This topic came up in another thread, so I wanted to highlight a few
>> things that we've discovered in our endeavors to build Beam behind a
>> firewall.
>>
>> Conceptually, in order to allow this, a user needs to provide alternate
>> mirrors for each "artifact" service required during build, and luckily I
>> think most of the toolchains used by Beam support this. For example, the
>> default PyPI mirror used by pip can be overridden via env var to an
>> internal mirror, and likewise for docker and its registry service.  I'm
>> currently looking into gogradle to see if we can provide an alternate
>> vendor directory as a shared resource behind our firewall. (I have a bigger
>> question here, which is why was it necessary to add a third language into
>> the python Beam ecosystem, just for the bootstrap process?  Couldn't the
>> boot code use python, or java?)
>>
>> But I'm getting ahead of myself.  We're actually stuck at the very
>> beginning, with gradlew.  The gradlew wrapper seems to unconditionally
>> download gradle, so you can't get past the first few hundred lines of code
>> in the build process without requiring internet access.  I made a ticket
>> here: https://issues.apache.org/jira/browse/BEAM-7931.  I'd love some
>> pointers on how to fix this, because the offending code lives inside
>> gradle-wrapper.jar, so I can't change it without access to the source.
>>
>> thanks,
>> -chad
>>
>>

Re: Allowing firewalled/offline builds of Beam

Posted by Udi Meiri <eh...@google.com>.
You can download it here: https://gradle.org/releases/
and run it instead of using the wrapper.

Example:
$ cd
$ unzip Downloads/gradle-5.5.1-bin.zip
$ cd ~/src/beam
$ ~/gradle-5.5.1/bin/gradle lint


On Thu, Aug 8, 2019 at 10:52 AM Chad Dombrova <ch...@gmail.com> wrote:

> This topic came up in another thread, so I wanted to highlight a few
> things that we've discovered in our endeavors to build Beam behind a
> firewall.
>
> Conceptually, in order to allow this, a user needs to provide alternate
> mirrors for each "artifact" service required during build, and luckily I
> think most of the toolchains used by Beam support this. For example, the
> default PyPI mirror used by pip can be overridden via env var to an
> internal mirror, and likewise for docker and its registry service.  I'm
> currently looking into gogradle to see if we can provide an alternate
> vendor directory as a shared resource behind our firewall. (I have a bigger
> question here, which is why was it necessary to add a third language into
> the python Beam ecosystem, just for the bootstrap process?  Couldn't the
> boot code use python, or java?)
>
> But I'm getting ahead of myself.  We're actually stuck at the very
> beginning, with gradlew.  The gradlew wrapper seems to unconditionally
> download gradle, so you can't get past the first few hundred lines of code
> in the build process without requiring internet access.  I made a ticket
> here: https://issues.apache.org/jira/browse/BEAM-7931.  I'd love some
> pointers on how to fix this, because the offending code lives inside
> gradle-wrapper.jar, so I can't change it without access to the source.
>
> thanks,
> -chad
>
>