You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jill Cardamon <ji...@gmail.com> on 2023/05/11 18:35:17 UTC

Issues using PyFlink

Hi!

I'm new to Flink, and I have been trying to run a simple python flink
script that consumes messages from Kafka as well as the examples locally
with a few issues.

1. When I run the word count example using `./flink-1.17.0/bin/flink run
--python flink-1.17.0/examples/python/datastream/word_count.py`, I get the
following error:

```
Traceback (most recent call last):
  File "/Users/jill/flink-1.17.0/examples/python/datastream/word_count.py",
line 134, in <module>
    word_count(known_args.input, known_args.output)
  File "/Users/jill/flink-1.17.0/examples/python/datastream/word_count.py",
line 89, in word_count
    ds = ds.flat_map(split) \
  File
"/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
line 354, in flat_map
  File
"/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
line 654, in process
  File "<frozen zipimport>", line 259, in load_module
  File
"/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
line 22, in <module>
ModuleNotFoundError: No module named 'google'
org.apache.flink.client.program.ProgramAbortException:
java.lang.RuntimeException: Python process exits with code: 1
at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:140)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
at
org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at
org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
Caused by: java.lang.RuntimeException: Python process exits with code: 1
at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:130)
... 14 more
```

This issue happens with and without `-pyclientexec venv/bin/python3`. When
running the example using `python word_count.py`, I get a similar error but
the `pyflink` module is not found. I installed pyflink using `python -m pip
install apache-flink` within a python venv (not a conda one) and downloaded
the corresponding binary. Looking through this mailing list's archive as
well as on Stack Overflow, I saw similar issues, and the fixes were usually
from installing flink incorrectly. For me, Flink and its dependencies
(including protobuf) are in
`.../Library/Python/3.9/lib/python/site-packages/...`. I'm pretty sure I'm
doing something silly, but I'm not quite sure what the fix is here.

2. When I run my Python script (consumes json-formatted messages from Kafka
to a datastream and prints) using `./flink-1.17.0/bin/flink run --python
kafka_consumer.py`, I see the same issue as the traceback above. When using
`python kafka_consumer.py`, the program hangs on `env.execute()`. I see no
running jobs on the dashboard. Is there a good way to go about debugging
this? Should I be waiting a while for it to start running?

Thanks in advance!

- Jill

Re: Issues using PyFlink

Posted by Dian Fu <di...@gmail.com>.
Hi Jill,

I suspect that the PyFlink isn't installed in the Python environment which
is used to run the example. Could you share the complete command you used
to execute the example: `./flink-1.17.0/bin/flink run -pyclientexec
venv/bin/python3 --python flink-1.17.0/examples/python/
datastream/word_count.py`. I think this is in-complete.

Regards,
Dian

On Fri, May 12, 2023 at 2:36 AM Jill Cardamon <ji...@gmail.com>
wrote:

> Hi!
>
> I'm new to Flink, and I have been trying to run a simple python flink
> script that consumes messages from Kafka as well as the examples locally
> with a few issues.
>
> 1. When I run the word count example using `./flink-1.17.0/bin/flink run
> --python flink-1.17.0/examples/python/datastream/word_count.py`, I get the
> following error:
>
> ```
> Traceback (most recent call last):
>   File
> "/Users/jill/flink-1.17.0/examples/python/datastream/word_count.py", line
> 134, in <module>
>     word_count(known_args.input, known_args.output)
>   File
> "/Users/jill/flink-1.17.0/examples/python/datastream/word_count.py", line
> 89, in word_count
>     ds = ds.flat_map(split) \
>   File
> "/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
> line 354, in flat_map
>   File
> "/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
> line 654, in process
>   File "<frozen zipimport>", line 259, in load_module
>   File
> "/Users/jill/flink-1.17.0/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
> line 22, in <module>
> ModuleNotFoundError: No module named 'google'
> org.apache.flink.client.program.ProgramAbortException:
> java.lang.RuntimeException: Python process exits with code: 1
> at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:140)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
> at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
> at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
> at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
> at
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
> at
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: Python process exits with code: 1
> at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:130)
> ... 14 more
> ```
>
> This issue happens with and without `-pyclientexec venv/bin/python3`. When
> running the example using `python word_count.py`, I get a similar error but
> the `pyflink` module is not found. I installed pyflink using `python -m pip
> install apache-flink` within a python venv (not a conda one) and downloaded
> the corresponding binary. Looking through this mailing list's archive as
> well as on Stack Overflow, I saw similar issues, and the fixes were usually
> from installing flink incorrectly. For me, Flink and its dependencies
> (including protobuf) are in
> `.../Library/Python/3.9/lib/python/site-packages/...`. I'm pretty sure I'm
> doing something silly, but I'm not quite sure what the fix is here.
>
> 2. When I run my Python script (consumes json-formatted messages from
> Kafka to a datastream and prints) using `./flink-1.17.0/bin/flink run
> --python kafka_consumer.py`, I see the same issue as the traceback above.
> When using `python kafka_consumer.py`, the program hangs on
> `env.execute()`. I see no running jobs on the dashboard. Is there a good
> way to go about debugging this? Should I be waiting a while for it to start
> running?
>
> Thanks in advance!
>
> - Jill
>
>
>
>