You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2021/06/10 10:00:19 UTC

Re: Multiple architectures support on Beam (ARM)

As a follow up on this with the merge of
https://github.com/apache/beam/pull/14832 Beam will be producing python
wheels for AARCH64 starting on Beam 2.32.0!
Also due to the recent version updates (grpc, protobuf and arrow) we should
be pretty close to fully support it without extra compilation.
Seems like the only missing piece is cython
https://github.com/cython/cython/issues/3892

Now the next important step would be to make the docker images multi-arch.
That would be a great contribution if someone is motivated.


On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw <ro...@google.com> wrote:

> Cython supports ARM64. The issue here is that we don't have a C++ compiler
> (It's looking for 'cc') available in the container (and grpc, and possibly
> others, don't have wheel files for this platform). I wonder if apt-get
> install build-essential would be sufficient.
>
> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Nice to see the interest, I also suppose that devs on Apple macbooks with
>> the
>> new M1 processor will soon request this feature.
>>
>> I ran today some pipelines on ARM64 on classic runners relatively easy
>> which was expected.  We will have issues however for the Java 8 SDK
>> harness
>> because the parent image openjdk:8 is not supported yet for ARM64.
>>
>> I tried to setup a python dev environment and found the first issue. It
>> looks
>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>
>> $ pip install -r build-requirements.txt
>>
>> Collecting grpcio-tools==1.30.0
>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>      |████████████████████████████████| 2.1 MB 21.7 MB/s
>>     ERROR: Command errored out with exit status 1:
>>      command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>
>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>
>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>          cwd:
>> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>>     Complete output (11 lines):
>>     Traceback (most recent call last):
>>       File "<string>", line 1, in <module>
>>       File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 112, in <module>
>>         if check_linker_need_libatomic():
>>       File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 73, in check_linker_need_libatomic
>>         cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>       File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>>         self._execute_child(args, executable, preexec_fn, close_fds,
>>       File "/usr/lib/python3.8/subprocess.py", line 1702, in
>> _execute_child
>>         raise child_exception_type(errno_num, err_msg, err_filename)
>>     FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>>     ----------------------------------------
>> WARNING: Discarding
>>
>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>> exit status 1: python setup.py egg_info Check the logs for full
>> command output.
>> ERROR: Could not find a version that satisfies the requirement
>> grpcio-tools==1.30.0
>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>
>> [1] https://pypi.org/project/grpcio-tools/#files
>> [2] https://github.com/grpc/grpc/issues/21283
>>
>> I can imagine also that we will have some struggles with the python
>> harness
>> and all of its dependencies. Does cython already support ARM64?
>>
>> I went and filled some JIRAs to keep track of this:
>>
>> BEAM-11703 Support apache-beam python install on ARM64
>> BEAM-11704 Support Beam docker images on ARM64
>>
>>
>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke <ro...@frantil.com> wrote:
>> >
>> > I believe so.
>> >
>> > The Go SDK requires in most instances for a user to Register their
>> DoFns at package init time, linked to the type/functions fully qualified
>> path as detemined by Go, which is consistent across architectures, at least
>> with the standard toochain.
>> >
>> > Those strings are used to look things up on distributed workers,
>> regardless of the architecture.
>> >
>> >
>> >
>> > On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>
>> >> Cool. Are DoFn (et al) references compatible across cross-compiled
>> binaries?
>> >>
>> >> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke <ro...@frantil.com>
>> wrote:
>> >>>
>> >>> Go cross compilation is as simple as setting the right flag env
>> variables [1], but can be as complicated as requiring a cross compiling GCC
>> instance installed if CGO[2] is necessary. I think we're probably clear on
>> just needing the flag though for the various Boot executables.
>> >>>
>> >>> For go pipelines we'd need to update the shared runner code to
>> support selecting the cross compiled worker binary environment. I believe
>> it's hard set to amd64 linux at present, but that's a separate issue.
>> >>>
>> >>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
>> >>> [2] https://golang.org/cmd/cgo/
>> >>>
>> >>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>>>
>> >>>> +1
>> >>>>
>> >>>> I don't think it would be that hard to build and release arm-based
>> docker images. (Perhaps just a matter of changing the docker file to depend
>> on a different base, and doing some cross-compile. That would suss out
>> whether we're inadvertently taking on any incompatible dependencies.)
>> >>>>
>> >>>> Theoretically, if one does that and manually specifies the
>> container, it could just work for Python (assuming no wheel files are
>> specified as manual dependencies). For Java, if one builds/deploys an
>> uberjar (on a different architecture), there may be issues in any
>> transitive dependency that has JNI code (us or users). I'd imagine this
>> issue is common to and being explored by many of the other Java big data
>> systems in use; it'd be interesting to know what solutions are out there.
>> >>>>
>> >>>> For go, the executable is uploaded directly into the container. We'd
>> probably have to do something fancier like cross-compiling the executable
>> (and making sure the UserFn references, which I think are just pointers
>> into the binary, still work if the launcher is one architecture and the
>> workers another).
>> >>>>
>> >>>> Definitely worth exploring.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> I stumbled today on this user request:
>> >>>>> BEAM-10982 Wheel support for linux aarch64
>> >>>>>
>> >>>>> It made me wonder if with the advent of ARM64 processors not only in
>> >>>>> the client but server side (Graviton and others) if it is worth that
>> >>>>> we start to think about having support for this architecture on the
>> >>>>> python installers and in the docker images. It seems that for the
>> >>>>> latter it should not be that difficult given that our parent images
>> >>>>> are already multi-arch.
>> >>>>>
>> >>>>> Are there some possible issues or binary/platform specific
>> >>>>> dependencies that impede us from doing this?
>>
>

Re: Multiple architectures support on Beam (ARM)

Posted by Robert Bradshaw <ro...@google.com>.
On Thu, Jun 10, 2021 at 3:00 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
> As a follow up on this with the merge of https://github.com/apache/beam/pull/14832 Beam will be producing python wheels for AARCH64 starting on Beam 2.32.0!

Nice.

> Also due to the recent version updates (grpc, protobuf and arrow) we should be pretty close to fully support it without extra compilation.
> Seems like the only missing piece is cython https://github.com/cython/cython/issues/3892

Cython already supports ARM. This is just about providing pre-built
wheels for installing Cython (which aren't necessarily needed).

> Now the next important step would be to make the docker images multi-arch. That would be a great contribution if someone is motivated.
>
>
> On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw <ro...@google.com> wrote:
>>
>> Cython supports ARM64. The issue here is that we don't have a C++ compiler (It's looking for 'cc') available in the container (and grpc, and possibly others, don't have wheel files for this platform). I wonder if apt-get install build-essential would be sufficient.
>>
>> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>> Nice to see the interest, I also suppose that devs on Apple macbooks with the
>>> new M1 processor will soon request this feature.
>>>
>>> I ran today some pipelines on ARM64 on classic runners relatively easy
>>> which was expected.  We will have issues however for the Java 8 SDK harness
>>> because the parent image openjdk:8 is not supported yet for ARM64.
>>>
>>> I tried to setup a python dev environment and found the first issue. It looks
>>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>>
>>> $ pip install -r build-requirements.txt
>>>
>>> Collecting grpcio-tools==1.30.0
>>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>>      |████████████████████████████████| 2.1 MB 21.7 MB/s
>>>     ERROR: Command errored out with exit status 1:
>>>      command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>>          cwd: /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>>>     Complete output (11 lines):
>>>     Traceback (most recent call last):
>>>       File "<string>", line 1, in <module>
>>>       File "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 112, in <module>
>>>         if check_linker_need_libatomic():
>>>       File "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 73, in check_linker_need_libatomic
>>>         cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>>       File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>>>         self._execute_child(args, executable, preexec_fn, close_fds,
>>>       File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
>>>         raise child_exception_type(errno_num, err_msg, err_filename)
>>>     FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>>>     ----------------------------------------
>>> WARNING: Discarding
>>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>>> exit status 1: python setup.py egg_info Check the logs for full
>>> command output.
>>> ERROR: Could not find a version that satisfies the requirement
>>> grpcio-tools==1.30.0
>>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>>
>>> [1] https://pypi.org/project/grpcio-tools/#files
>>> [2] https://github.com/grpc/grpc/issues/21283
>>>
>>> I can imagine also that we will have some struggles with the python harness
>>> and all of its dependencies. Does cython already support ARM64?
>>>
>>> I went and filled some JIRAs to keep track of this:
>>>
>>> BEAM-11703 Support apache-beam python install on ARM64
>>> BEAM-11704 Support Beam docker images on ARM64
>>>
>>>
>>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke <ro...@frantil.com> wrote:
>>> >
>>> > I believe so.
>>> >
>>> > The Go SDK requires in most instances for a user to Register their DoFns at package init time, linked to the type/functions fully qualified path as detemined by Go, which is consistent across architectures, at least with the standard toochain.
>>> >
>>> > Those strings are used to look things up on distributed workers, regardless of the architecture.
>>> >
>>> >
>>> >
>>> > On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw <ro...@google.com> wrote:
>>> >>
>>> >> Cool. Are DoFn (et al) references compatible across cross-compiled binaries?
>>> >>
>>> >> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke <ro...@frantil.com> wrote:
>>> >>>
>>> >>> Go cross compilation is as simple as setting the right flag env variables [1], but can be as complicated as requiring a cross compiling GCC instance installed if CGO[2] is necessary. I think we're probably clear on just needing the flag though for the various Boot executables.
>>> >>>
>>> >>> For go pipelines we'd need to update the shared runner code to support selecting the cross compiled worker binary environment. I believe it's hard set to amd64 linux at present, but that's a separate issue.
>>> >>>
>>> >>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
>>> >>> [2] https://golang.org/cmd/cgo/
>>> >>>
>>> >>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw <ro...@google.com> wrote:
>>> >>>>
>>> >>>> +1
>>> >>>>
>>> >>>> I don't think it would be that hard to build and release arm-based docker images. (Perhaps just a matter of changing the docker file to depend on a different base, and doing some cross-compile. That would suss out whether we're inadvertently taking on any incompatible dependencies.)
>>> >>>>
>>> >>>> Theoretically, if one does that and manually specifies the container, it could just work for Python (assuming no wheel files are specified as manual dependencies). For Java, if one builds/deploys an uberjar (on a different architecture), there may be issues in any transitive dependency that has JNI code (us or users). I'd imagine this issue is common to and being explored by many of the other Java big data systems in use; it'd be interesting to know what solutions are out there.
>>> >>>>
>>> >>>> For go, the executable is uploaded directly into the container. We'd probably have to do something fancier like cross-compiling the executable (and making sure the UserFn references, which I think are just pointers into the binary, still work if the launcher is one architecture and the workers another).
>>> >>>>
>>> >>>> Definitely worth exploring.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>> >>>>>
>>> >>>>> I stumbled today on this user request:
>>> >>>>> BEAM-10982 Wheel support for linux aarch64
>>> >>>>>
>>> >>>>> It made me wonder if with the advent of ARM64 processors not only in
>>> >>>>> the client but server side (Graviton and others) if it is worth that
>>> >>>>> we start to think about having support for this architecture on the
>>> >>>>> python installers and in the docker images. It seems that for the
>>> >>>>> latter it should not be that difficult given that our parent images
>>> >>>>> are already multi-arch.
>>> >>>>>
>>> >>>>> Are there some possible issues or binary/platform specific
>>> >>>>> dependencies that impede us from doing this?