You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Kyle Weaver <kc...@google.com> on 2019/12/02 18:34:59 UTC

Re: Installing system dependencies in a DataFlow worker - how

Sorry Carl, I think the page I sent you might be either incorrect or
incomplete. I filed https://issues.apache.org/jira/browse/BEAM-8863 for
that.

In the meantime, the instructions on the page Luke linked should work.

On Thu, Nov 28, 2019 at 9:38 AM Carl Thomé <ca...@gmail.com> wrote:

> Thanks!
>
> I tried following https://beam
> .apache.org/documentation/runtime/environments/ but get a "Custom images
> are not yet supported" error message from DataFlow. Perhaps I did something
> wrong?
>
> "error": {
>> "code": 400,
>> "message": "(24f8c9b6e647d55d): The workflow could not be created.
>> Causes: (24f8c9b6e647de48): Invalid worker harness container image:
>> my_image. Custom images are not yet supported.",
>> "status": "INVALID_ARGUMENT"
>> }
>>
>
> On Wed, 27 Nov 2019 at 18:34, Kyle Weaver <kc...@google.com> wrote:
>
>> You can also configure your own Docker images if you like, instructions
>> here: https://beam.apache.org/documentation/runtime/environments/
>>
>> On Wed, Nov 27, 2019 at 12:38 AM Carl Thomé <ca...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a Beam pipeline written in the Python SDK that decodes audio
>>> files into TFRecord:s. I'd like to run it on DataFlow but I'm missing
>>> libsndfile1 in the workers.
>>>
>>> Is there any way of configuring the base image for the DataFlow workers
>>> (e.g. Dockerfile + apt install) to get audio decoding working?
>>>
>>> On a similar note, when it comes to Python dependencies in the DataFlow
>>> runtime (like librosa), is there a wish list somewhere on which we can
>>> upvote missing Python libraries?
>>>
>>> Cheers,
>>> Carl Thomé
>>>
>>

Re: Installing system dependencies in a DataFlow worker - how

Posted by Valentyn Tymofieiev <va...@google.com>.
Dataflow currently supports custom containers only with pipelines that set
--experiments=beam_fn_api. All Python streaming pipelines do so indirectly.
Batch pipelines require setting this flag manually, but some functionality
may not be available [1]. You could try custom containers with a simple
(e.g. wordcount) pipeline first, and then try your pipeline. You can also
try running your pipeline on Dataflow runner without custom containers and
with --experiments=beam_fn_api to isolate any potential issues you might
hit.

[1]
https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0

On Mon, Dec 2, 2019 at 10:35 AM Kyle Weaver <kc...@google.com> wrote:

> Sorry Carl, I think the page I sent you might be either incorrect or
> incomplete. I filed https://issues.apache.org/jira/browse/BEAM-8863 for
> that.
>
> In the meantime, the instructions on the page Luke linked should work.
>
> On Thu, Nov 28, 2019 at 9:38 AM Carl Thomé <ca...@gmail.com> wrote:
>
>> Thanks!
>>
>> I tried following https://beam
>> .apache.org/documentation/runtime/environments/ but get a "Custom images
>> are not yet supported" error message from DataFlow. Perhaps I did something
>> wrong?
>>
>> "error": {
>>> "code": 400,
>>> "message": "(24f8c9b6e647d55d): The workflow could not be created.
>>> Causes: (24f8c9b6e647de48): Invalid worker harness container image:
>>> my_image. Custom images are not yet supported.",
>>> "status": "INVALID_ARGUMENT"
>>> }
>>>
>>
>> On Wed, 27 Nov 2019 at 18:34, Kyle Weaver <kc...@google.com> wrote:
>>
>>> You can also configure your own Docker images if you like, instructions
>>> here: https://beam.apache.org/documentation/runtime/environments/
>>>
>>> On Wed, Nov 27, 2019 at 12:38 AM Carl Thomé <ca...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a Beam pipeline written in the Python SDK that decodes audio
>>>> files into TFRecord:s. I'd like to run it on DataFlow but I'm missing
>>>> libsndfile1 in the workers.
>>>>
>>>> Is there any way of configuring the base image for the DataFlow workers
>>>> (e.g. Dockerfile + apt install) to get audio decoding working?
>>>>
>>>> On a similar note, when it comes to Python dependencies in the DataFlow
>>>> runtime (like librosa), is there a wish list somewhere on which we can
>>>> upvote missing Python libraries?
>>>>
>>>> Cheers,
>>>> Carl Thomé
>>>>
>>>