You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Nir Gazit <ni...@gmail.com> on 2021/01/14 14:16:13 UTC

Deploying with Flink and Beam Python Worker

Hi,
I'm trying to deploy the word count job on a Flink cluster in Kubernetes.
However, when trying to run the job (with python workers as a side car to
the Flink task masters), I get the following error:

2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot
--id=1-1 --logging_endpoint=localhost:39233
--artifact_endpoint=localhost:34123 --provision_endpoint=localhost:33519
--control_endpoint=localhost:44987
2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc =

Anyone knows what could be the reason?

Re: Deploying with Flink and Beam Python Worker

Posted by Sam Bourne <sa...@gmail.com>.
Thanks! I actually managed to have my own deployment for running it locally
and it works well for local file, but I get this weird error while trying
to run the word count example and I’m trying to understand what can be the
cause of it?

If you’re referring to this error:

2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/pickled_main_session

Then that is likely because the flink taskmanager cannot find the staged
artifacts on disk. If you’re using the FlinkRunner you use the flag
--flink_submit_uber_jar which will stuff all the artifacts into a jar file
you submit. If you’re using the PortableRunner, then you need to share the
staging volume between your artifact service (typically running wherever
your job service is) and the flink taskworker like I have configured here
<https://github.com/sambvfx/beam-flink-k8s/tree/master/k8s/with_job_server>.

BTW - is this something we may want to add to the official repo? (I can do
it ofc). It took me a while to find how to set up a local deployment for
Flink and the docs weren’t super detailed.

More documentation is certainly helpful. It took me quite some time to
figure out what I have now so I would +1 any effort to save the next person

On Thu, Jan 14, 2021 at 11:42 AM Nir Gazit <ni...@gmail.com> wrote:

> BTW - is this something we may want to add to the official repo? (I can do
> it ofc). It took me a while to find how to set up a local deployment for
> Flink and the docs weren't super detailed.
>
> On Thu, Jan 14, 2021 at 9:39 PM Nir Gazit <ni...@gmail.com> wrote:
>
>> Thanks! I actually managed to have my own deployment for running it
>> locally and it works well for local file, but I get this weird error while
>> trying to run the word count example and I'm trying to understand what can
>> be the cause of it?
>>
>> On Thu, Jan 14, 2021 at 7:29 PM Sam Bourne <sa...@gmail.com> wrote:
>>
>>> Hi Nir,
>>>
>>> I have a simple repo where I have a proof of concept deployment setup
>>> for doing this.
>>>
>>> https://github.com/sambvfx/beam-flink-k8s
>>>
>>> Depending on the type of runner you're using there are a few
>>> explanations. That repo should hopefully point you in the right direction.
>>>
>>> On Thu, Jan 14, 2021 at 6:16 AM Nir Gazit <ni...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I'm trying to deploy the word count job on a Flink cluster in
>>>> Kubernetes. However, when trying to run the job (with python workers as a
>>>> side car to the Flink task masters), I get the following error:
>>>>
>>>> 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot
>>>> --id=1-1 --logging_endpoint=localhost:39233
>>>> --artifact_endpoint=localhost:34123 --provision_endpoint=localhost:33519
>>>> --control_endpoint=localhost:44987
>>>> 2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
>>>> /tmp/staged in 3 attempts: failed to retrieve chunk for
>>>> /tmp/staged/pickled_main_session
>>>> caused by:
>>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>>> /tmp/staged/pickled_main_session
>>>> caused by:
>>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>>> /tmp/staged/pickled_main_session
>>>> caused by:
>>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>>> /tmp/staged/pickled_main_session
>>>> caused by:
>>>> rpc error: code = Unknown desc =
>>>>
>>>> Anyone knows what could be the reason?
>>>>
>>>

Re: Deploying with Flink and Beam Python Worker

Posted by Nir Gazit <ni...@gmail.com>.
BTW - is this something we may want to add to the official repo? (I can do
it ofc). It took me a while to find how to set up a local deployment for
Flink and the docs weren't super detailed.

On Thu, Jan 14, 2021 at 9:39 PM Nir Gazit <ni...@gmail.com> wrote:

> Thanks! I actually managed to have my own deployment for running it
> locally and it works well for local file, but I get this weird error while
> trying to run the word count example and I'm trying to understand what can
> be the cause of it?
>
> On Thu, Jan 14, 2021 at 7:29 PM Sam Bourne <sa...@gmail.com> wrote:
>
>> Hi Nir,
>>
>> I have a simple repo where I have a proof of concept deployment setup for
>> doing this.
>>
>> https://github.com/sambvfx/beam-flink-k8s
>>
>> Depending on the type of runner you're using there are a few
>> explanations. That repo should hopefully point you in the right direction.
>>
>> On Thu, Jan 14, 2021 at 6:16 AM Nir Gazit <ni...@gmail.com> wrote:
>>
>>> Hi,
>>> I'm trying to deploy the word count job on a Flink cluster in
>>> Kubernetes. However, when trying to run the job (with python workers as a
>>> side car to the Flink task masters), I get the following error:
>>>
>>> 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot
>>> --id=1-1 --logging_endpoint=localhost:39233
>>> --artifact_endpoint=localhost:34123 --provision_endpoint=localhost:33519
>>> --control_endpoint=localhost:44987
>>> 2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
>>> /tmp/staged in 3 attempts: failed to retrieve chunk for
>>> /tmp/staged/pickled_main_session
>>> caused by:
>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>> /tmp/staged/pickled_main_session
>>> caused by:
>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>> /tmp/staged/pickled_main_session
>>> caused by:
>>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>>> /tmp/staged/pickled_main_session
>>> caused by:
>>> rpc error: code = Unknown desc =
>>>
>>> Anyone knows what could be the reason?
>>>
>>

Re: Deploying with Flink and Beam Python Worker

Posted by Nir Gazit <ni...@gmail.com>.
Thanks! I actually managed to have my own deployment for running it locally
and it works well for local file, but I get this weird error while trying
to run the word count example and I'm trying to understand what can be the
cause of it?

On Thu, Jan 14, 2021 at 7:29 PM Sam Bourne <sa...@gmail.com> wrote:

> Hi Nir,
>
> I have a simple repo where I have a proof of concept deployment setup for
> doing this.
>
> https://github.com/sambvfx/beam-flink-k8s
>
> Depending on the type of runner you're using there are a few explanations.
> That repo should hopefully point you in the right direction.
>
> On Thu, Jan 14, 2021 at 6:16 AM Nir Gazit <ni...@gmail.com> wrote:
>
>> Hi,
>> I'm trying to deploy the word count job on a Flink cluster in Kubernetes.
>> However, when trying to run the job (with python workers as a side car to
>> the Flink task masters), I get the following error:
>>
>> 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot
>> --id=1-1 --logging_endpoint=localhost:39233
>> --artifact_endpoint=localhost:34123 --provision_endpoint=localhost:33519
>> --control_endpoint=localhost:44987
>> 2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
>> /tmp/staged in 3 attempts: failed to retrieve chunk for
>> /tmp/staged/pickled_main_session
>> caused by:
>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>> /tmp/staged/pickled_main_session
>> caused by:
>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>> /tmp/staged/pickled_main_session
>> caused by:
>> rpc error: code = Unknown desc = ; failed to retrieve chunk for
>> /tmp/staged/pickled_main_session
>> caused by:
>> rpc error: code = Unknown desc =
>>
>> Anyone knows what could be the reason?
>>
>

Re: Deploying with Flink and Beam Python Worker

Posted by Sam Bourne <sa...@gmail.com>.
Hi Nir,

I have a simple repo where I have a proof of concept deployment setup for
doing this.

https://github.com/sambvfx/beam-flink-k8s

Depending on the type of runner you're using there are a few explanations.
That repo should hopefully point you in the right direction.

On Thu, Jan 14, 2021 at 6:16 AM Nir Gazit <ni...@gmail.com> wrote:

> Hi,
> I'm trying to deploy the word count job on a Flink cluster in Kubernetes.
> However, when trying to run the job (with python workers as a side car to
> the Flink task masters), I get the following error:
>
> 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot
> --id=1-1 --logging_endpoint=localhost:39233
> --artifact_endpoint=localhost:34123 --provision_endpoint=localhost:33519
> --control_endpoint=localhost:44987
> 2021/01/14 14:11:45 Failed to retrieve staged files: failed to retrieve
> /tmp/staged in 3 attempts: failed to retrieve chunk for
> /tmp/staged/pickled_main_session
> caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/pickled_main_session
> caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/pickled_main_session
> caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/pickled_main_session
> caused by:
> rpc error: code = Unknown desc =
>
> Anyone knows what could be the reason?
>