You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Adam Pearce <ad...@xorsecurity.com> on 2021/09/02 21:19:00 UTC

[Question] Basic Python examples.wordcount on local FlinkRunner

Hello all,

I’m attempting to run a simple, minimally viable example of a Beam pipeline on Flink. I have installed flink-1.13.2 following the setup instructions, and have successfully run the server and navigated to localhost:8081 to view the Web UI. I can see the job successfully submitted and running. It runs, and completes, but the output is not appropriate and the final output never occurs. I have enabled DEBUG logging for further output, but I don’t really see anything that would indicate issues other than what I am showing below. I’ve attached the complete log.

I am running the following from a Python 3.8.12 virtualenv with apache-beam 2.31.0 installed via pip:

python -m apache_beam.examples.wordcount --input input.txt --output counts --runner FlinkRunner --flink_master="localhost:8081" [--flink_submit_uber_jar]

I have tried with and without “--flink_submit_uber_jar” without any change. The local embedded run of this pipeline works (same command as above, omitting “flink_master”. I understand that running this through the Flink server will use docker to stand up containers to perform the work. It appears to be successfully pulling the image: “apache/beam_python3.8_sdk:2.31.0”. I’m curious if there is some issue with the Python container, because in the logs, I am seeing:

2021/09/02 20:57:54 Initializing python harness: /opt/apache/beam/boot --id=5-1 --provision_endpoint=host.docker.internal:55847
2021/09/02 20:57:54 Downloaded: /tmp/staged/pickled_main_session (sha256: 4e9a1199bade55ad73ae6872c8f156c69227ef23d8155f19dead745264999084, size: 3029)
2021/09/02 20:57:54 Found artifact: pickled_main_session
2021/09/02 20:57:54 Installing setup packages ...
2021/09/02 20:57:54 Executing: python -m apache_beam.runners.worker.sdk_worker_main
2021/09/02 20:57:55 Python exited: <nil>

I’ve searched high and low for that “Python exited: <nil>” but have found very little.

Further context:
Operating system: macOS 11.5.2
Docker: Docker version 20.10.8, build 3967b7d
Python: 3.8.12 (virtualenv)
Beam: 2.31.0
Flink: 1.13.2
This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.

Re: [Question] Basic Python examples.wordcount on local FlinkRunner

Posted by Adam Pearce <ad...@xorsecurity.com>.
Thanks Dian, that seemed to do the trick. I built a Docker Image simply using:

FROM apache/beam_python3.8_sdk:2.31.0
COPY flink_data/input.txt .

I specified nothing in the Pipeline options other than:

"--runner=FlinkRunner",
"--flink_master=localhost:8081",
"--environment_type=DOCKER",
"--environment_config=xor/beam_worker:latest"

This seemed to do the trick!

I’m curious if you have any recommendations on the typical strategies for supplying artifacts like this at runtime, rather than having to build an image for jobs? I think ideally we have a pipeline consistently running somewhere and we can submit data―let’s say CSV files―and the pipeline will process them as a job is kicked off. Given how much discussion I’ve seen around Kafka, I imagine this is done with some sort of distributed messaging framework.

If anyone can offer suggestions or resources, I’d very much appreciate it!

Thanks again for the help!

Best,
Adam


From: Dian Fu <di...@gmail.com>
Date: Friday, September 3, 2021 at 2:10 AM
To: Adam Pearce <ad...@xorsecurity.com>
Cc: user@flink.apache.org <us...@flink.apache.org>
Subject: [EXTERNAL]Re: [Question] Basic Python examples.wordcount on local FlinkRunner
This seems more like a Beam issue although it uses Flink runner. It would be helpful to also send it to the Beam user mailing list.

Regarding to this issue itself, could you check is input.txt accessible in the Docker container?

Regards,
Dian


2021年9月3日 上午5:19,Adam Pearce <ad...@xorsecurity.com>> 写道:

Hello all,

I’m attempting to run a simple, minimally viable example of a Beam pipeline on Flink. I have installed flink-1.13.2 following the setup instructions, and have successfully run the server and navigated to localhost:8081 to view the Web UI. I can see the job successfully submitted and running. It runs, and completes, but the output is not appropriate and the final output never occurs. I have enabled DEBUG logging for further output, but I don’t really see anything that would indicate issues other than what I am showing below. I’ve attached the complete log.

I am running the following from a Python 3.8.12 virtualenv with apache-beam 2.31.0 installed via pip:

python -m apache_beam.examples.wordcount --input input.txt --output counts --runner FlinkRunner --flink_master="localhost:8081" [--flink_submit_uber_jar]

I have tried with and without “--flink_submit_uber_jar” without any change. The local embedded run of this pipeline works (same command as above, omitting “flink_master”. I understand that running this through the Flink server will use docker to stand up containers to perform the work. It appears to be successfully pulling the image: “apache/beam_python3.8_sdk:2.31.0”. I’m curious if there is some issue with the Python container, because in the logs, I am seeing:

2021/09/02 20:57:54 Initializing python harness: /opt/apache/beam/boot --id=5-1 --provision_endpoint=host.docker.internal:55847
2021/09/02 20:57:54 Downloaded: /tmp/staged/pickled_main_session (sha256:4e9a1199bade55ad73ae6872c8f156c69227ef23d8155f19dead745264999084, size: 3029)
2021/09/02 20:57:54 Found artifact: pickled_main_session
2021/09/02 20:57:54 Installing setup packages ...
2021/09/02 20:57:54 Executing: python -m apache_beam.runners.worker.sdk_worker_main
2021/09/02 20:57:55 Python exited: <nil>

I’ve searched high and low for that “Python exited: <nil>” but have found very little.

Further context:
Operating system: macOS 11.5.2
Docker: Docker version 20.10.8, build 3967b7d
Python: 3.8.12 (virtualenv)
Beam: 2.31.0
Flink: 1.13.2
This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments. <flink-root-standalonesession-3-MacBook-Pro.local.log><flink-root-taskexecutor-3-MacBook-Pro.local.log>

This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.

Re: [Question] Basic Python examples.wordcount on local FlinkRunner

Posted by Dian Fu <di...@gmail.com>.
This seems more like a Beam issue although it uses Flink runner. It would be helpful to also send it to the Beam user mailing list.

Regarding to this issue itself, could you check is input.txt accessible in the Docker container?

Regards,
Dian

> 2021年9月3日 上午5:19,Adam Pearce <ad...@xorsecurity.com> 写道:
> 
> Hello all,
>  
> I’m attempting to run a simple, minimally viable example of a Beam pipeline on Flink. I have installed flink-1.13.2 following the setup instructions, and have successfully run the server and navigated to localhost:8081 to view the Web UI. I can see the job successfully submitted and running. It runs, and completes, but the output is not appropriate and the final output never occurs. I have enabled DEBUG logging for further output, but I don’t really see anything that would indicate issues other than what I am showing below. I’ve attached the complete log.
>  
> I am running the following from a Python 3.8.12 virtualenv with apache-beam 2.31.0 installed via pip:
>  
> python -m apache_beam.examples.wordcount --input input.txt --output counts --runner FlinkRunner --flink_master="localhost:8081" [--flink_submit_uber_jar]
>  
> I have tried with and without “--flink_submit_uber_jar” without any change. The local embedded run of this pipeline works (same command as above, omitting “flink_master”. I understand that running this through the Flink server will use docker to stand up containers to perform the work. It appears to be successfully pulling the image: “apache/beam_python3.8_sdk:2.31.0”. I’m curious if there is some issue with the Python container, because in the logs, I am seeing:
>  
> 2021/09/02 20:57:54 Initializing python harness: /opt/apache/beam/boot --id=5-1 --provision_endpoint=host.docker.internal:55847
> 2021/09/02 20:57:54 Downloaded: /tmp/staged/pickled_main_session (sha256:4e9a1199bade55ad73ae6872c8f156c69227ef23d8155f19dead745264999084, size: 3029)
> 2021/09/02 20:57:54 Found artifact: pickled_main_session
> 2021/09/02 20:57:54 Installing setup packages ...
> 2021/09/02 20:57:54 Executing: python -m apache_beam.runners.worker.sdk_worker_main
> 2021/09/02 20:57:55 Python exited: <nil>
>  
> I’ve searched high and low for that “Python exited: <nil>” but have found very little.
>  
> Further context:
> Operating system: macOS 11.5.2
> Docker: Docker version 20.10.8, build 3967b7d
> Python: 3.8.12 (virtualenv)
> Beam: 2.31.0
> Flink: 1.13.2
> This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments. <flink-root-standalonesession-3-MacBook-Pro.local.log><flink-root-taskexecutor-3-MacBook-Pro.local.log>