You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Yu Watanabe <yu...@gmail.com> on 2019/09/23 12:42:50 UTC

How to reference manifest from apache flink worker node ?

Hello.

I am working on flink runner (2.15.0) and would like to ask question about
how to solve my error.

Currently , I have a remote cluster deployed as below . (please see slide1)
All master and worker nodes are installed on different server from apache
beam.

https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing

When I run beam pipeline, harness container tries to start up, however,
fails immediately with below error on docker side.
=====================================================================================
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
Initializing python harness: /opt/apache/beam/boot --id=1
--logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
--provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
Sep 23 21:04:05 ip-172-31-0-143 dockerd:
time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
module=libcontainerd namespace=moby topic=/tasks/start
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
Failed to retrieve staged files: failed to get manifest
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
Unknown desc =
=====================================================================================

At the same time, task manager logs below error.
=====================================================================================
2019-09-23 21:04:05,525 INFO
 org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
2019-09-23 21:04:05,526 INFO
 org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - Loading manifest for retrieval token
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
2019-09-23 21:04:05,531 INFO
 org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
 - GetManifest for
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
failed
java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
/tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
(No such file or directory)
        at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
...
=====================================================================================

I see this artifact directory on the server where beam pipeline is executed
but not on worker node.
=====================================================================================
# Beam server
(python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld /tmp/artifactsfkyik3us
drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us

# Flink worker node
[ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
 =====================================================================================

From the error, it seems that container is not starting up correctly due to
manifest file is missing.
What would be a good approach to reference artifact directory from worker
node?
I appreciate if I could get some advice .

Best Regards,
Yu Watanabe

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.tennis@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Re: How to reference manifest from apache flink worker node ?

Posted by Yu Watanabe <yu...@gmail.com>.
Ankur.

Thank you for the comment.

Manual start up with job server looks more solid.
Setting TMPDIR then using Flink Runner sounded bit hacky .

Thanks,
Yu

On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <go...@google.com> wrote:

> Flink job server does not have artifacts-dir option yet.
> We have a PR to add it https://github.com/apache/beam/pull/9648
>
> However, for now you can do a few manual steps to achieve this.
>
> Start Job server.
>
> 1. Download
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>
> 2. Start the job server
> java -jar
> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
> --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081
> --artifacts-dir /home/ec2-user --job-port 8099
>
> 3. Submit your pipeline
> options = PipelineOptions([
>                       "--runner=PortableRunner",
>                       "--environment_config=
> asia.gcr.io/creationline001/beam/python3:latest",
>                       "--environment_type=DOCKER",
>                       "--experiments=beam_fn_api"
>                   ])
>
> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <yu...@gmail.com> wrote:
>
>> Kyle.
>>
>> Thank you for the assistance.
>>
>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the
>> pipline.
>> ====================================================================
>> WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
>> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
>> '--artifacts-dir=/home/ec2-user']
>> ...
>> WARNING:root:Downloading job server jar from
>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>> DEBUG:root:Starting job service with ['java', '-jar',
>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>> '--flink-master-url',
>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
>> '--expansion-port', 0]
>> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
>> ...
>> DEBUG:root:Runner option 'job_endpoint' was already added
>> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
>> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin']
>> ...
>> ====================================================================
>>
>> My pipeline option .
>> ====================================================================
>>         options = PipelineOptions([
>>                       "--runner=FlinkRunner",
>>                       "--flink_version=1.8",
>>
>> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
>> ,
>>                       "--environment_config=
>> asia.gcr.io/creationline001/beam/python3:latest",
>>                       "--environment_type=DOCKER",
>>                       "--experiments=beam_fn_api",
>>                       "--artifacts-dir=/home/admin"
>>                   ])
>> ====================================================================
>>
>> Tracing from the log , looks like artifacts dir respects the default
>> tempdir of OS.
>> Thus, to adjust it I will need environment variable instead. I used
>> 'TMPDIR' in my case.
>> ====================================================================
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
>>     artifacts_dir = self.local_temp_dir(prefix='artifacts')
>>
>> =>
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
>>     return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)
>>
>> =>
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
>>     self._local_temp_root = None
>> ====================================================================
>>
>> Then it worked.
>> ====================================================================
>> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
>> TMPDIR=/home/admin
>>
>> =>
>> DEBUG:root:Starting job service with ['java', '-jar',
>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>> '--flink-master-url',
>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
>> '--expansion-port', 0]
>> ====================================================================
>>
>> I will use nfs server for now to share the artifact directory with worker
>> nodes.
>>
>> Thanks.
>> Yu Watanabe
>>
>> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> The relevant configuration flag for the job server is `--artifacts-dir`.
>>>
>>> @Robert Bradshaw <ro...@google.com> I added this info to the log
>>> message: https://github.com/apache/beam/pull/9646
>>>
>>> Kyle Weaver | Software Engineer | github.com/ibzib | kcweaver@google.com
>>>
>>>
>>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> You need to set your artifact directory to point to a
>>>> distributed filesystem that's also accessible to the workers (when starting
>>>> up the job server).
>>>>
>>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> I am working on flink runner (2.15.0) and would like to ask question
>>>>> about how to solve my error.
>>>>>
>>>>> Currently , I have a remote cluster deployed as below . (please see
>>>>> slide1)
>>>>> All master and worker nodes are installed on different server from
>>>>> apache beam.
>>>>>
>>>>>
>>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>>>
>>>>> When I run beam pipeline, harness container tries to start up,
>>>>> however, fails immediately with below error on docker side.
>>>>>
>>>>> =====================================================================================
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1
>>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>>>> module=libcontainerd namespace=moby topic=/tasks/start
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>>>>> Unknown desc =
>>>>>
>>>>> =====================================================================================
>>>>>
>>>>> At the same time, task manager logs below error.
>>>>>
>>>>> =====================================================================================
>>>>> 2019-09-23 21:04:05,525 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - GetManifest for
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> 2019-09-23 21:04:05,526 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - Loading manifest for retrieval token
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> 2019-09-23 21:04:05,531 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - GetManifest for
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> failed
>>>>> java.util.concurrent.ExecutionException:
>>>>> java.io.FileNotFoundException:
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> (No such file or directory)
>>>>>         at
>>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>>>> ...
>>>>>
>>>>> =====================================================================================
>>>>>
>>>>> I see this artifact directory on the server where beam pipeline is
>>>>> executed but not on worker node.
>>>>>
>>>>> =====================================================================================
>>>>> # Beam server
>>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>>>> /tmp/artifactsfkyik3us
>>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>>>
>>>>> # Flink worker node
>>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>>>
>>>>>  =====================================================================================
>>>>>
>>>>> From the error, it seems that container is not starting up correctly
>>>>> due to manifest file is missing.
>>>>> What would be a good approach to reference artifact directory from
>>>>> worker node?
>>>>> I appreciate if I could get some advice .
>>>>>
>>>>> Best Regards,
>>>>> Yu Watanabe
>>>>>
>>>>> --
>>>>> Yu Watanabe
>>>>> Weekend Freelancer who loves to challenge building data platform
>>>>> yu.w.tennis@gmail.com
>>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>>
>>>>
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> yu.w.tennis@gmail.com
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.tennis@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Re: How to reference manifest from apache flink worker node ?

Posted by Yu Watanabe <yu...@gmail.com>.
I think beam is a fantastic software . Its a great pleasure contributing.

Thanks,
Yu

On Fri, Sep 27, 2019 at 9:09 AM Robert Bradshaw <ro...@google.com> wrote:

> https://issues.apache.org/jira/browse/BEAM-8312 should help with this. In
> summary, I would say that portable beam-on-Flink is basically feature
> complete, but we're still smoothing out some of the ease-of-use issues. And
> feedback like this really helps, so thanks!
>
> On Tue, Sep 24, 2019 at 8:10 PM Yu Watanabe <yu...@gmail.com> wrote:
>
>> I needed little adjustment with the pipeline option to work it out.
>>
>> Pipeline option required 'job_endpoint'  when using manual start up of
>> job server using jar file.
>> =========================================================
>>         options = PipelineOptions([
>>                       "--runner=PortableRunner",
>>                       "--environment_config=
>> asia.gcr.io/creationline001/beam/python3:latest",
>>                       "--environment_type=DOCKER",
>>                       "--experiments=beam_fn_api",
>>                       "--job_endpoint=localhost:8099"
>>                   ])
>>  =========================================================
>>
>> Otherwise, runner spins up job server container.
>> ==========================================================
>> ERROR:root:Starting job service with ['docker', 'run', '-v',
>> '/usr/bin/docker:/bin/docker', '-v',
>> '/var/run/docker.sock:/var/run/docker.sock', '--network=host', '
>> admin-docker-apache.bintray.io/beam/flink-job-server:latest',
>> '--job-host', 'localhost', '--job-port', '42915', '--artifact-port',
>> '56561', '--expansion-port', '53857']
>> ERROR:root:Error bringing up job service
>> ==========================================================
>>
>> Without "job_endpoint" option default job server is used.
>> ==========================================================
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/portable_runner.py#L176
>>       server = self.default_job_server(options)
>>
>> =>
>>
>> https://github.com/apache/beam/blob/d963aeb91a63f165b5ff1ebf6add8275aec204f1/sdks/python/apache_beam/runners/portability/job_server.py#L172
>>         "-docker-apache.bintray.io/beam/flink-job-server:latest"
>>  ==========================================================
>>
>> Thanks,
>> Yu
>>
>> On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <go...@google.com> wrote:
>>
>>> Flink job server does not have artifacts-dir option yet.
>>> We have a PR to add it https://github.com/apache/beam/pull/9648
>>>
>>> However, for now you can do a few manual steps to achieve this.
>>>
>>> Start Job server.
>>>
>>> 1. Download
>>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>>>
>>> 2. Start the job server
>>> java -jar
>>> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
>>> --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081
>>> --artifacts-dir /home/ec2-user --job-port 8099
>>>
>>> 3. Submit your pipeline
>>> options = PipelineOptions([
>>>                       "--runner=PortableRunner",
>>>                       "--environment_config=
>>> asia.gcr.io/creationline001/beam/python3:latest",
>>>                       "--environment_type=DOCKER",
>>>                       "--experiments=beam_fn_api"
>>>                   ])
>>>
>>> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <yu...@gmail.com>
>>> wrote:
>>>
>>>> Kyle.
>>>>
>>>> Thank you for the assistance.
>>>>
>>>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of
>>>> the pipline.
>>>> ====================================================================
>>>> WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
>>>> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
>>>> '--artifacts-dir=/home/ec2-user']
>>>> ...
>>>> WARNING:root:Downloading job server jar from
>>>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>>>> DEBUG:root:Starting job service with ['java', '-jar',
>>>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>>>> '--flink-master-url',
>>>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>>>> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
>>>> '--expansion-port', 0]
>>>> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
>>>> ...
>>>> DEBUG:root:Runner option 'job_endpoint' was already added
>>>> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
>>>> WARNING:root:Discarding unparseable args:
>>>> ['--artifacts-dir=/home/admin']
>>>> ...
>>>> ====================================================================
>>>>
>>>> My pipeline option .
>>>> ====================================================================
>>>>         options = PipelineOptions([
>>>>                       "--runner=FlinkRunner",
>>>>                       "--flink_version=1.8",
>>>>
>>>> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
>>>> ,
>>>>                       "--environment_config=
>>>> asia.gcr.io/creationline001/beam/python3:latest",
>>>>                       "--environment_type=DOCKER",
>>>>                       "--experiments=beam_fn_api",
>>>>                       "--artifacts-dir=/home/admin"
>>>>                   ])
>>>> ====================================================================
>>>>
>>>> Tracing from the log , looks like artifacts dir respects the default
>>>> tempdir of OS.
>>>> Thus, to adjust it I will need environment variable instead. I used
>>>> 'TMPDIR' in my case.
>>>> ====================================================================
>>>>
>>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
>>>>     artifacts_dir = self.local_temp_dir(prefix='artifacts')
>>>>
>>>> =>
>>>>
>>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
>>>>     return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)
>>>>
>>>> =>
>>>>
>>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
>>>>     self._local_temp_root = None
>>>> ====================================================================
>>>>
>>>> Then it worked.
>>>> ====================================================================
>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
>>>> TMPDIR=/home/admin
>>>>
>>>> =>
>>>> DEBUG:root:Starting job service with ['java', '-jar',
>>>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>>>> '--flink-master-url',
>>>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>>>> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
>>>> '--expansion-port', 0]
>>>> ====================================================================
>>>>
>>>> I will use nfs server for now to share the artifact directory with
>>>> worker nodes.
>>>>
>>>> Thanks.
>>>> Yu Watanabe
>>>>
>>>> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com>
>>>> wrote:
>>>>
>>>>> The relevant configuration flag for the job server is
>>>>> `--artifacts-dir`.
>>>>>
>>>>> @Robert Bradshaw <ro...@google.com> I added this info to the log
>>>>> message: https://github.com/apache/beam/pull/9646
>>>>>
>>>>> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>>> kcweaver@google.com
>>>>>
>>>>>
>>>>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> You need to set your artifact directory to point to a
>>>>>> distributed filesystem that's also accessible to the workers (when starting
>>>>>> up the job server).
>>>>>>
>>>>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello.
>>>>>>>
>>>>>>> I am working on flink runner (2.15.0) and would like to ask question
>>>>>>> about how to solve my error.
>>>>>>>
>>>>>>> Currently , I have a remote cluster deployed as below . (please see
>>>>>>> slide1)
>>>>>>> All master and worker nodes are installed on different server from
>>>>>>> apache beam.
>>>>>>>
>>>>>>>
>>>>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>>>>>
>>>>>>> When I run beam pipeline, harness container tries to start up,
>>>>>>> however, fails immediately with below error on docker side.
>>>>>>>
>>>>>>> =====================================================================================
>>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1
>>>>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>>>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>>>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>>>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>>>>>> module=libcontainerd namespace=moby topic=/tasks/start
>>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest
>>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code
>>>>>>> = Unknown desc =
>>>>>>>
>>>>>>> =====================================================================================
>>>>>>>
>>>>>>> At the same time, task manager logs below error.
>>>>>>>
>>>>>>> =====================================================================================
>>>>>>> 2019-09-23 21:04:05,525 INFO
>>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>>  - GetManifest for
>>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>>> 2019-09-23 21:04:05,526 INFO
>>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>>  - Loading manifest for retrieval token
>>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>>> 2019-09-23 21:04:05,531 INFO
>>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>>  - GetManifest for
>>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>>> failed
>>>>>>> java.util.concurrent.ExecutionException:
>>>>>>> java.io.FileNotFoundException:
>>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>>> (No such file or directory)
>>>>>>>         at
>>>>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>>>>>> ...
>>>>>>>
>>>>>>> =====================================================================================
>>>>>>>
>>>>>>> I see this artifact directory on the server where beam pipeline is
>>>>>>> executed but not on worker node.
>>>>>>>
>>>>>>> =====================================================================================
>>>>>>> # Beam server
>>>>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>>>>>> /tmp/artifactsfkyik3us
>>>>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>>>>>
>>>>>>> # Flink worker node
>>>>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>>>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>>>>>
>>>>>>>  =====================================================================================
>>>>>>>
>>>>>>> From the error, it seems that container is not starting up correctly
>>>>>>> due to manifest file is missing.
>>>>>>> What would be a good approach to reference artifact directory from
>>>>>>> worker node?
>>>>>>> I appreciate if I could get some advice .
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Yu Watanabe
>>>>>>>
>>>>>>> --
>>>>>>> Yu Watanabe
>>>>>>> Weekend Freelancer who loves to challenge building data platform
>>>>>>> yu.w.tennis@gmail.com
>>>>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Yu Watanabe
>>>> Weekend Freelancer who loves to challenge building data platform
>>>> yu.w.tennis@gmail.com
>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>
>>>
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> yu.w.tennis@gmail.com
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.tennis@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Re: How to reference manifest from apache flink worker node ?

Posted by Robert Bradshaw <ro...@google.com>.
https://issues.apache.org/jira/browse/BEAM-8312 should help with this. In
summary, I would say that portable beam-on-Flink is basically feature
complete, but we're still smoothing out some of the ease-of-use issues. And
feedback like this really helps, so thanks!

On Tue, Sep 24, 2019 at 8:10 PM Yu Watanabe <yu...@gmail.com> wrote:

> I needed little adjustment with the pipeline option to work it out.
>
> Pipeline option required 'job_endpoint'  when using manual start up of job
> server using jar file.
> =========================================================
>         options = PipelineOptions([
>                       "--runner=PortableRunner",
>                       "--environment_config=
> asia.gcr.io/creationline001/beam/python3:latest",
>                       "--environment_type=DOCKER",
>                       "--experiments=beam_fn_api",
>                       "--job_endpoint=localhost:8099"
>                   ])
>  =========================================================
>
> Otherwise, runner spins up job server container.
> ==========================================================
> ERROR:root:Starting job service with ['docker', 'run', '-v',
> '/usr/bin/docker:/bin/docker', '-v',
> '/var/run/docker.sock:/var/run/docker.sock', '--network=host', '
> admin-docker-apache.bintray.io/beam/flink-job-server:latest',
> '--job-host', 'localhost', '--job-port', '42915', '--artifact-port',
> '56561', '--expansion-port', '53857']
> ERROR:root:Error bringing up job service
> ==========================================================
>
> Without "job_endpoint" option default job server is used.
> ==========================================================
>
> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/portable_runner.py#L176
>       server = self.default_job_server(options)
>
> =>
>
> https://github.com/apache/beam/blob/d963aeb91a63f165b5ff1ebf6add8275aec204f1/sdks/python/apache_beam/runners/portability/job_server.py#L172
>         "-docker-apache.bintray.io/beam/flink-job-server:latest"
>  ==========================================================
>
> Thanks,
> Yu
>
> On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <go...@google.com> wrote:
>
>> Flink job server does not have artifacts-dir option yet.
>> We have a PR to add it https://github.com/apache/beam/pull/9648
>>
>> However, for now you can do a few manual steps to achieve this.
>>
>> Start Job server.
>>
>> 1. Download
>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>>
>> 2. Start the job server
>> java -jar
>> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
>> --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081
>> --artifacts-dir /home/ec2-user --job-port 8099
>>
>> 3. Submit your pipeline
>> options = PipelineOptions([
>>                       "--runner=PortableRunner",
>>                       "--environment_config=
>> asia.gcr.io/creationline001/beam/python3:latest",
>>                       "--environment_type=DOCKER",
>>                       "--experiments=beam_fn_api"
>>                   ])
>>
>> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <yu...@gmail.com>
>> wrote:
>>
>>> Kyle.
>>>
>>> Thank you for the assistance.
>>>
>>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of
>>> the pipline.
>>> ====================================================================
>>> WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
>>> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
>>> '--artifacts-dir=/home/ec2-user']
>>> ...
>>> WARNING:root:Downloading job server jar from
>>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>>> DEBUG:root:Starting job service with ['java', '-jar',
>>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>>> '--flink-master-url',
>>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>>> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
>>> '--expansion-port', 0]
>>> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
>>> ...
>>> DEBUG:root:Runner option 'job_endpoint' was already added
>>> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
>>> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin']
>>> ...
>>> ====================================================================
>>>
>>> My pipeline option .
>>> ====================================================================
>>>         options = PipelineOptions([
>>>                       "--runner=FlinkRunner",
>>>                       "--flink_version=1.8",
>>>
>>> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
>>> ,
>>>                       "--environment_config=
>>> asia.gcr.io/creationline001/beam/python3:latest",
>>>                       "--environment_type=DOCKER",
>>>                       "--experiments=beam_fn_api",
>>>                       "--artifacts-dir=/home/admin"
>>>                   ])
>>> ====================================================================
>>>
>>> Tracing from the log , looks like artifacts dir respects the default
>>> tempdir of OS.
>>> Thus, to adjust it I will need environment variable instead. I used
>>> 'TMPDIR' in my case.
>>> ====================================================================
>>>
>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
>>>     artifacts_dir = self.local_temp_dir(prefix='artifacts')
>>>
>>> =>
>>>
>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
>>>     return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)
>>>
>>> =>
>>>
>>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
>>>     self._local_temp_root = None
>>> ====================================================================
>>>
>>> Then it worked.
>>> ====================================================================
>>> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
>>> TMPDIR=/home/admin
>>>
>>> =>
>>> DEBUG:root:Starting job service with ['java', '-jar',
>>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>>> '--flink-master-url',
>>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>>> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
>>> '--expansion-port', 0]
>>> ====================================================================
>>>
>>> I will use nfs server for now to share the artifact directory with
>>> worker nodes.
>>>
>>> Thanks.
>>> Yu Watanabe
>>>
>>> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com> wrote:
>>>
>>>> The relevant configuration flag for the job server is `--artifacts-dir`.
>>>>
>>>> @Robert Bradshaw <ro...@google.com> I added this info to the log
>>>> message: https://github.com/apache/beam/pull/9646
>>>>
>>>> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>> kcweaver@google.com
>>>>
>>>>
>>>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> You need to set your artifact directory to point to a
>>>>> distributed filesystem that's also accessible to the workers (when starting
>>>>> up the job server).
>>>>>
>>>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> I am working on flink runner (2.15.0) and would like to ask question
>>>>>> about how to solve my error.
>>>>>>
>>>>>> Currently , I have a remote cluster deployed as below . (please see
>>>>>> slide1)
>>>>>> All master and worker nodes are installed on different server from
>>>>>> apache beam.
>>>>>>
>>>>>>
>>>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>>>>
>>>>>> When I run beam pipeline, harness container tries to start up,
>>>>>> however, fails immediately with below error on docker side.
>>>>>>
>>>>>> =====================================================================================
>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1
>>>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>>>>> module=libcontainerd namespace=moby topic=/tasks/start
>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest
>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>>>>>> Unknown desc =
>>>>>>
>>>>>> =====================================================================================
>>>>>>
>>>>>> At the same time, task manager logs below error.
>>>>>>
>>>>>> =====================================================================================
>>>>>> 2019-09-23 21:04:05,525 INFO
>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>  - GetManifest for
>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>> 2019-09-23 21:04:05,526 INFO
>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>  - Loading manifest for retrieval token
>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>> 2019-09-23 21:04:05,531 INFO
>>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>>  - GetManifest for
>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>> failed
>>>>>> java.util.concurrent.ExecutionException:
>>>>>> java.io.FileNotFoundException:
>>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>>> (No such file or directory)
>>>>>>         at
>>>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>>>>> ...
>>>>>>
>>>>>> =====================================================================================
>>>>>>
>>>>>> I see this artifact directory on the server where beam pipeline is
>>>>>> executed but not on worker node.
>>>>>>
>>>>>> =====================================================================================
>>>>>> # Beam server
>>>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>>>>> /tmp/artifactsfkyik3us
>>>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>>>>
>>>>>> # Flink worker node
>>>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>>>>
>>>>>>  =====================================================================================
>>>>>>
>>>>>> From the error, it seems that container is not starting up correctly
>>>>>> due to manifest file is missing.
>>>>>> What would be a good approach to reference artifact directory from
>>>>>> worker node?
>>>>>> I appreciate if I could get some advice .
>>>>>>
>>>>>> Best Regards,
>>>>>> Yu Watanabe
>>>>>>
>>>>>> --
>>>>>> Yu Watanabe
>>>>>> Weekend Freelancer who loves to challenge building data platform
>>>>>> yu.w.tennis@gmail.com
>>>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>>>
>>>>>
>>>
>>> --
>>> Yu Watanabe
>>> Weekend Freelancer who loves to challenge building data platform
>>> yu.w.tennis@gmail.com
>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>
>>
>
> --
> Yu Watanabe
> Weekend Freelancer who loves to challenge building data platform
> yu.w.tennis@gmail.com
> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
> Twitter icon] <https://twitter.com/yuwtennis>
>

Re: How to reference manifest from apache flink worker node ?

Posted by Yu Watanabe <yu...@gmail.com>.
I needed little adjustment with the pipeline option to work it out.

Pipeline option required 'job_endpoint'  when using manual start up of job
server using jar file.
=========================================================
        options = PipelineOptions([
                      "--runner=PortableRunner",
                      "--environment_config=
asia.gcr.io/creationline001/beam/python3:latest",
                      "--environment_type=DOCKER",
                      "--experiments=beam_fn_api",
                      "--job_endpoint=localhost:8099"
                  ])
 =========================================================

Otherwise, runner spins up job server container.
==========================================================
ERROR:root:Starting job service with ['docker', 'run', '-v',
'/usr/bin/docker:/bin/docker', '-v',
'/var/run/docker.sock:/var/run/docker.sock', '--network=host', '
admin-docker-apache.bintray.io/beam/flink-job-server:latest', '--job-host',
'localhost', '--job-port', '42915', '--artifact-port', '56561',
'--expansion-port', '53857']
ERROR:root:Error bringing up job service
==========================================================

Without "job_endpoint" option default job server is used.
==========================================================
https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/portable_runner.py#L176
      server = self.default_job_server(options)

=>
https://github.com/apache/beam/blob/d963aeb91a63f165b5ff1ebf6add8275aec204f1/sdks/python/apache_beam/runners/portability/job_server.py#L172
        "-docker-apache.bintray.io/beam/flink-job-server:latest"
 ==========================================================

Thanks,
Yu

On Tue, Sep 24, 2019 at 2:06 PM Ankur Goenka <go...@google.com> wrote:

> Flink job server does not have artifacts-dir option yet.
> We have a PR to add it https://github.com/apache/beam/pull/9648
>
> However, for now you can do a few manual steps to achieve this.
>
> Start Job server.
>
> 1. Download
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>
> 2. Start the job server
> java -jar
> /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
> --flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081
> --artifacts-dir /home/ec2-user --job-port 8099
>
> 3. Submit your pipeline
> options = PipelineOptions([
>                       "--runner=PortableRunner",
>                       "--environment_config=
> asia.gcr.io/creationline001/beam/python3:latest",
>                       "--environment_type=DOCKER",
>                       "--experiments=beam_fn_api"
>                   ])
>
> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <yu...@gmail.com> wrote:
>
>> Kyle.
>>
>> Thank you for the assistance.
>>
>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the
>> pipline.
>> ====================================================================
>> WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
>> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
>> '--artifacts-dir=/home/ec2-user']
>> ...
>> WARNING:root:Downloading job server jar from
>> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
>> DEBUG:root:Starting job service with ['java', '-jar',
>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>> '--flink-master-url',
>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
>> '--expansion-port', 0]
>> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
>> ...
>> DEBUG:root:Runner option 'job_endpoint' was already added
>> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
>> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin']
>> ...
>> ====================================================================
>>
>> My pipeline option .
>> ====================================================================
>>         options = PipelineOptions([
>>                       "--runner=FlinkRunner",
>>                       "--flink_version=1.8",
>>
>> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
>> ,
>>                       "--environment_config=
>> asia.gcr.io/creationline001/beam/python3:latest",
>>                       "--environment_type=DOCKER",
>>                       "--experiments=beam_fn_api",
>>                       "--artifacts-dir=/home/admin"
>>                   ])
>> ====================================================================
>>
>> Tracing from the log , looks like artifacts dir respects the default
>> tempdir of OS.
>> Thus, to adjust it I will need environment variable instead. I used
>> 'TMPDIR' in my case.
>> ====================================================================
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
>>     artifacts_dir = self.local_temp_dir(prefix='artifacts')
>>
>> =>
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
>>     return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)
>>
>> =>
>>
>> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
>>     self._local_temp_root = None
>> ====================================================================
>>
>> Then it worked.
>> ====================================================================
>> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
>> TMPDIR=/home/admin
>>
>> =>
>> DEBUG:root:Starting job service with ['java', '-jar',
>> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
>> '--flink-master-url',
>> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
>> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
>> '--expansion-port', 0]
>> ====================================================================
>>
>> I will use nfs server for now to share the artifact directory with worker
>> nodes.
>>
>> Thanks.
>> Yu Watanabe
>>
>> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> The relevant configuration flag for the job server is `--artifacts-dir`.
>>>
>>> @Robert Bradshaw <ro...@google.com> I added this info to the log
>>> message: https://github.com/apache/beam/pull/9646
>>>
>>> Kyle Weaver | Software Engineer | github.com/ibzib | kcweaver@google.com
>>>
>>>
>>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> You need to set your artifact directory to point to a
>>>> distributed filesystem that's also accessible to the workers (when starting
>>>> up the job server).
>>>>
>>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> I am working on flink runner (2.15.0) and would like to ask question
>>>>> about how to solve my error.
>>>>>
>>>>> Currently , I have a remote cluster deployed as below . (please see
>>>>> slide1)
>>>>> All master and worker nodes are installed on different server from
>>>>> apache beam.
>>>>>
>>>>>
>>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>>>
>>>>> When I run beam pipeline, harness container tries to start up,
>>>>> however, fails immediately with below error on docker side.
>>>>>
>>>>> =====================================================================================
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>> 12:04:05 Initializing python harness: /opt/apache/beam/boot --id=1
>>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>>>> module=libcontainerd namespace=moby topic=/tasks/start
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23
>>>>> 12:04:05 Failed to retrieve staged files: failed to get manifest
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>>>>> Unknown desc =
>>>>>
>>>>> =====================================================================================
>>>>>
>>>>> At the same time, task manager logs below error.
>>>>>
>>>>> =====================================================================================
>>>>> 2019-09-23 21:04:05,525 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - GetManifest for
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> 2019-09-23 21:04:05,526 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - Loading manifest for retrieval token
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> 2019-09-23 21:04:05,531 INFO
>>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>>  - GetManifest for
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> failed
>>>>> java.util.concurrent.ExecutionException:
>>>>> java.io.FileNotFoundException:
>>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>>> (No such file or directory)
>>>>>         at
>>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>>>> ...
>>>>>
>>>>> =====================================================================================
>>>>>
>>>>> I see this artifact directory on the server where beam pipeline is
>>>>> executed but not on worker node.
>>>>>
>>>>> =====================================================================================
>>>>> # Beam server
>>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>>>> /tmp/artifactsfkyik3us
>>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>>>
>>>>> # Flink worker node
>>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>>>
>>>>>  =====================================================================================
>>>>>
>>>>> From the error, it seems that container is not starting up correctly
>>>>> due to manifest file is missing.
>>>>> What would be a good approach to reference artifact directory from
>>>>> worker node?
>>>>> I appreciate if I could get some advice .
>>>>>
>>>>> Best Regards,
>>>>> Yu Watanabe
>>>>>
>>>>> --
>>>>> Yu Watanabe
>>>>> Weekend Freelancer who loves to challenge building data platform
>>>>> yu.w.tennis@gmail.com
>>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>>
>>>>
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> yu.w.tennis@gmail.com
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.tennis@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Re: How to reference manifest from apache flink worker node ?

Posted by Ankur Goenka <go...@google.com>.
Flink job server does not have artifacts-dir option yet.
We have a PR to add it https://github.com/apache/beam/pull/9648

However, for now you can do a few manual steps to achieve this.

Start Job server.

1. Download
https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar

2. Start the job server
java -jar
/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
--flink-master-url ip-172-31-12-113.ap-northeast-1.compute.internal:8081
--artifacts-dir /home/ec2-user --job-port 8099

3. Submit your pipeline
options = PipelineOptions([
                      "--runner=PortableRunner",
                      "--environment_config=
asia.gcr.io/creationline001/beam/python3:latest",
                      "--environment_type=DOCKER",
                      "--experiments=beam_fn_api"
                  ])

On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe <yu...@gmail.com> wrote:

> Kyle.
>
> Thank you for the assistance.
>
> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the
> pipline.
> ====================================================================
> WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
> '--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
> '--artifacts-dir=/home/ec2-user']
> ...
> WARNING:root:Downloading job server jar from
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
> DEBUG:root:Starting job service with ['java', '-jar',
> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
> '--flink-master-url',
> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
> '/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
> '--expansion-port', 0]
> DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
> ...
> DEBUG:root:Runner option 'job_endpoint' was already added
> DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
> WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin']
> ...
> ====================================================================
>
> My pipeline option .
> ====================================================================
>         options = PipelineOptions([
>                       "--runner=FlinkRunner",
>                       "--flink_version=1.8",
>
> "--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
> ,
>                       "--environment_config=
> asia.gcr.io/creationline001/beam/python3:latest",
>                       "--environment_type=DOCKER",
>                       "--experiments=beam_fn_api",
>                       "--artifacts-dir=/home/admin"
>                   ])
> ====================================================================
>
> Tracing from the log , looks like artifacts dir respects the default
> tempdir of OS.
> Thus, to adjust it I will need environment variable instead. I used
> 'TMPDIR' in my case.
> ====================================================================
>
> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
>     artifacts_dir = self.local_temp_dir(prefix='artifacts')
>
> =>
>
> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
>     return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)
>
> =>
>
> https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
>     self._local_temp_root = None
> ====================================================================
>
> Then it worked.
> ====================================================================
> (python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
> TMPDIR=/home/admin
>
> =>
> DEBUG:root:Starting job service with ['java', '-jar',
> '/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
> '--flink-master-url',
> 'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
> '/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
> '--expansion-port', 0]
> ====================================================================
>
> I will use nfs server for now to share the artifact directory with worker
> nodes.
>
> Thanks.
> Yu Watanabe
>
> On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com> wrote:
>
>> The relevant configuration flag for the job server is `--artifacts-dir`.
>>
>> @Robert Bradshaw <ro...@google.com> I added this info to the log
>> message: https://github.com/apache/beam/pull/9646
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | kcweaver@google.com
>>
>>
>> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> You need to set your artifact directory to point to a
>>> distributed filesystem that's also accessible to the workers (when starting
>>> up the job server).
>>>
>>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>>> wrote:
>>>
>>>> Hello.
>>>>
>>>> I am working on flink runner (2.15.0) and would like to ask question
>>>> about how to solve my error.
>>>>
>>>> Currently , I have a remote cluster deployed as below . (please see
>>>> slide1)
>>>> All master and worker nodes are installed on different server from
>>>> apache beam.
>>>>
>>>>
>>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>>
>>>> When I run beam pipeline, harness container tries to start up, however,
>>>> fails immediately with below error on docker side.
>>>>
>>>> =====================================================================================
>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>>>> Initializing python harness: /opt/apache/beam/boot --id=1
>>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>>> module=libcontainerd namespace=moby topic=/tasks/start
>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>>>> Failed to retrieve staged files: failed to get manifest
>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>>>> Unknown desc =
>>>>
>>>> =====================================================================================
>>>>
>>>> At the same time, task manager logs below error.
>>>>
>>>> =====================================================================================
>>>> 2019-09-23 21:04:05,525 INFO
>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>  - GetManifest for
>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>> 2019-09-23 21:04:05,526 INFO
>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>  - Loading manifest for retrieval token
>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>> 2019-09-23 21:04:05,531 INFO
>>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>>  - GetManifest for
>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>> failed
>>>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
>>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>>> (No such file or directory)
>>>>         at
>>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>>> ...
>>>>
>>>> =====================================================================================
>>>>
>>>> I see this artifact directory on the server where beam pipeline is
>>>> executed but not on worker node.
>>>>
>>>> =====================================================================================
>>>> # Beam server
>>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>>> /tmp/artifactsfkyik3us
>>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>>
>>>> # Flink worker node
>>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>>
>>>>  =====================================================================================
>>>>
>>>> From the error, it seems that container is not starting up correctly
>>>> due to manifest file is missing.
>>>> What would be a good approach to reference artifact directory from
>>>> worker node?
>>>> I appreciate if I could get some advice .
>>>>
>>>> Best Regards,
>>>> Yu Watanabe
>>>>
>>>> --
>>>> Yu Watanabe
>>>> Weekend Freelancer who loves to challenge building data platform
>>>> yu.w.tennis@gmail.com
>>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>>
>>>
>
> --
> Yu Watanabe
> Weekend Freelancer who loves to challenge building data platform
> yu.w.tennis@gmail.com
> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
> Twitter icon] <https://twitter.com/yuwtennis>
>

Re: How to reference manifest from apache flink worker node ?

Posted by Yu Watanabe <yu...@gmail.com>.
Kyle.

Thank you for the assistance.

Looks like "--artifacts-dir" is not parsed. Below is the DEBUG log of the
pipline.
====================================================================
WARNING:root:Discarding unparseable args: ['--flink_version=1.8',
'--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081',
'--artifacts-dir=/home/ec2-user']
...
WARNING:root:Downloading job server jar from
https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.15.0/beam-runners-flink-1.8-job-server-2.15.0.jar
DEBUG:root:Starting job service with ['java', '-jar',
'/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
'--flink-master-url',
'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
'/tmp/artifactsicq2kj8c', '--job-port', 41757, '--artifact-port', 0,
'--expansion-port', 0]
DEBUG:root:Waiting for jobs grpc channel to be ready at localhost:41757.
...
DEBUG:root:Runner option 'job_endpoint' was already added
DEBUG:root:Runner option 'sdk_worker_parallelism' was already added
WARNING:root:Discarding unparseable args: ['--artifacts-dir=/home/admin']
...
====================================================================

My pipeline option .
====================================================================
        options = PipelineOptions([
                      "--runner=FlinkRunner",
                      "--flink_version=1.8",

"--flink_master_url=ip-172-31-12-113.ap-northeast-1.compute.internal:8081"
,
                      "--environment_config=
asia.gcr.io/creationline001/beam/python3:latest",
                      "--environment_type=DOCKER",
                      "--experiments=beam_fn_api",
                      "--artifacts-dir=/home/admin"
                  ])
====================================================================

Tracing from the log , looks like artifacts dir respects the default
tempdir of OS.
Thus, to adjust it I will need environment variable instead. I used
'TMPDIR' in my case.
====================================================================
https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L203
    artifacts_dir = self.local_temp_dir(prefix='artifacts')

=>
https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L157
    return tempfile.mkdtemp(dir=self._local_temp_root, **kwargs)

=>
https://github.com/apache/beam/blob/7931ec055e2da7214c82e368ef7d7fd679faaef1/sdks/python/apache_beam/runners/portability/job_server.py#L102
    self._local_temp_root = None
====================================================================

Then it worked.
====================================================================
(python-bm-2150) admin@ip-172-31-9-89:~$ env | grep TMPDIR
TMPDIR=/home/admin

=>
DEBUG:root:Starting job service with ['java', '-jar',
'/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
'--flink-master-url',
'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
'/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
'--expansion-port', 0]
====================================================================

I will use nfs server for now to share the artifact directory with worker
nodes.

Thanks.
Yu Watanabe

On Tue, Sep 24, 2019 at 9:50 AM Kyle Weaver <kc...@google.com> wrote:

> The relevant configuration flag for the job server is `--artifacts-dir`.
>
> @Robert Bradshaw <ro...@google.com> I added this info to the log
> message: https://github.com/apache/beam/pull/9646
>
> Kyle Weaver | Software Engineer | github.com/ibzib | kcweaver@google.com
>
>
> On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> You need to set your artifact directory to point to a
>> distributed filesystem that's also accessible to the workers (when starting
>> up the job server).
>>
>> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I am working on flink runner (2.15.0) and would like to ask question
>>> about how to solve my error.
>>>
>>> Currently , I have a remote cluster deployed as below . (please see
>>> slide1)
>>> All master and worker nodes are installed on different server from
>>> apache beam.
>>>
>>>
>>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>>
>>> When I run beam pipeline, harness container tries to start up, however,
>>> fails immediately with below error on docker side.
>>>
>>> =====================================================================================
>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>>> Initializing python harness: /opt/apache/beam/boot --id=1
>>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>>> module=libcontainerd namespace=moby topic=/tasks/start
>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>>> Failed to retrieve staged files: failed to get manifest
>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>>> Unknown desc =
>>>
>>> =====================================================================================
>>>
>>> At the same time, task manager logs below error.
>>>
>>> =====================================================================================
>>> 2019-09-23 21:04:05,525 INFO
>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>  - GetManifest for
>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>> 2019-09-23 21:04:05,526 INFO
>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>  - Loading manifest for retrieval token
>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>> 2019-09-23 21:04:05,531 INFO
>>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>>  - GetManifest for
>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>> failed
>>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
>>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>>> (No such file or directory)
>>>         at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>> ...
>>>
>>> =====================================================================================
>>>
>>> I see this artifact directory on the server where beam pipeline is
>>> executed but not on worker node.
>>>
>>> =====================================================================================
>>> # Beam server
>>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>>> /tmp/artifactsfkyik3us
>>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>>
>>> # Flink worker node
>>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>>
>>>  =====================================================================================
>>>
>>> From the error, it seems that container is not starting up correctly due
>>> to manifest file is missing.
>>> What would be a good approach to reference artifact directory from
>>> worker node?
>>> I appreciate if I could get some advice .
>>>
>>> Best Regards,
>>> Yu Watanabe
>>>
>>> --
>>> Yu Watanabe
>>> Weekend Freelancer who loves to challenge building data platform
>>> yu.w.tennis@gmail.com
>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>
>>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.tennis@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Re: How to reference manifest from apache flink worker node ?

Posted by Kyle Weaver <kc...@google.com>.
The relevant configuration flag for the job server is `--artifacts-dir`.

@Robert Bradshaw <ro...@google.com> I added this info to the log
message: https://github.com/apache/beam/pull/9646

Kyle Weaver | Software Engineer | github.com/ibzib | kcweaver@google.com


On Mon, Sep 23, 2019 at 11:36 AM Robert Bradshaw <ro...@google.com>
wrote:

> You need to set your artifact directory to point to a
> distributed filesystem that's also accessible to the workers (when starting
> up the job server).
>
> On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com> wrote:
>
>> Hello.
>>
>> I am working on flink runner (2.15.0) and would like to ask question
>> about how to solve my error.
>>
>> Currently , I have a remote cluster deployed as below . (please see
>> slide1)
>> All master and worker nodes are installed on different server from apache
>> beam.
>>
>>
>> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>>
>> When I run beam pipeline, harness container tries to start up, however,
>> fails immediately with below error on docker side.
>>
>> =====================================================================================
>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>> Initializing python harness: /opt/apache/beam/boot --id=1
>> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
>> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
>> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
>> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
>> module=libcontainerd namespace=moby topic=/tasks/start
>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
>> Failed to retrieve staged files: failed to get manifest
>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
>> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
>> Unknown desc =
>>
>> =====================================================================================
>>
>> At the same time, task manager logs below error.
>>
>> =====================================================================================
>> 2019-09-23 21:04:05,525 INFO
>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>  - GetManifest for
>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>> 2019-09-23 21:04:05,526 INFO
>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>  - Loading manifest for retrieval token
>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>> 2019-09-23 21:04:05,531 INFO
>>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>>  - GetManifest for
>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>> failed
>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
>> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
>> (No such file or directory)
>>         at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>> ...
>>
>> =====================================================================================
>>
>> I see this artifact directory on the server where beam pipeline is
>> executed but not on worker node.
>>
>> =====================================================================================
>> # Beam server
>> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
>> /tmp/artifactsfkyik3us
>> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>>
>> # Flink worker node
>> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
>> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>>
>>  =====================================================================================
>>
>> From the error, it seems that container is not starting up correctly due
>> to manifest file is missing.
>> What would be a good approach to reference artifact directory from worker
>> node?
>> I appreciate if I could get some advice .
>>
>> Best Regards,
>> Yu Watanabe
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> yu.w.tennis@gmail.com
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

Re: How to reference manifest from apache flink worker node ?

Posted by Robert Bradshaw <ro...@google.com>.
You need to set your artifact directory to point to a
distributed filesystem that's also accessible to the workers (when starting
up the job server).

On Mon, Sep 23, 2019 at 5:43 AM Yu Watanabe <yu...@gmail.com> wrote:

> Hello.
>
> I am working on flink runner (2.15.0) and would like to ask question about
> how to solve my error.
>
> Currently , I have a remote cluster deployed as below . (please see
> slide1)
> All master and worker nodes are installed on different server from apache
> beam.
>
>
> https://drive.google.com/file/d/1vBULp6kiEfQNGVV3Nl2mMKAZZKYsb11h/view?usp=sharing
>
> When I run beam pipeline, harness container tries to start up, however,
> fails immediately with below error on docker side.
>
> =====================================================================================
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
> Initializing python harness: /opt/apache/beam/boot --id=1
> --logging_endpoint=localhost:34227 --artifact_endpoint=localhost:45303
> --provision_endpoint=localhost:44585 --control_endpoint=localhost:43869
> Sep 23 21:04:05 ip-172-31-0-143 dockerd:
> time="2019-09-23T21:04:05.380942292+09:00" level=debug msg=event
> module=libcontainerd namespace=moby topic=/tasks/start
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: 2019/09/23 12:04:05
> Failed to retrieve staged files: failed to get manifest
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: #011caused by:
> Sep 23 21:04:05 ip-172-31-0-143 51106514ffc0[7920]: rpc error: code =
> Unknown desc =
>
> =====================================================================================
>
> At the same time, task manager logs below error.
>
> =====================================================================================
> 2019-09-23 21:04:05,525 INFO
>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - GetManifest for
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> 2019-09-23 21:04:05,526 INFO
>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - Loading manifest for retrieval token
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> 2019-09-23 21:04:05,531 INFO
>  org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>  - GetManifest for
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> failed
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
> /tmp/artifactsfkyik3us/job_de145881-9ea7-4e44-8e6d-31a6ea298010/MANIFEST
> (No such file or directory)
>         at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
> ...
>
> =====================================================================================
>
> I see this artifact directory on the server where beam pipeline is
> executed but not on worker node.
>
> =====================================================================================
> # Beam server
> (python-bm-2150) admin@ip-172-31-9-89:~$ sudo ls -ld
> /tmp/artifactsfkyik3us
> drwx------ 3 admin admin 4096 Sep 23 12:03 /tmp/artifactsfkyik3us
>
> # Flink worker node
> [ec2-user@ip-172-31-0-143 flink]$ sudo ls -ld /tmp/artifactsfkyik3us
> ls: cannot access /tmp/artifactsfkyik3us: No such file or directory
>
>  =====================================================================================
>
> From the error, it seems that container is not starting up correctly due
> to manifest file is missing.
> What would be a good approach to reference artifact directory from worker
> node?
> I appreciate if I could get some advice .
>
> Best Regards,
> Yu Watanabe
>
> --
> Yu Watanabe
> Weekend Freelancer who loves to challenge building data platform
> yu.w.tennis@gmail.com
> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
> Twitter icon] <https://twitter.com/yuwtennis>
>