You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Joe Malt <jm...@yelp.com> on 2018/07/25 21:32:15 UTC
Running a Python streaming job with Java dependencies
Hi,
I'm trying to run a job with Flink's new Python streaming API but I'm
running into issues with Java imports.
I have a Jython project in IntelliJ with a lot of Java dependencies
configured through Maven. I can't figure out how to make Flink "see" these
dependencies.
An example script that exhibits the problem is the following (it's the
streaming example from the docs (
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/python.html#streaming-program-example)
but with an extra import added)
from org.apache.flink.streaming.api.functions.source import SourceFunction
from org.apache.flink.api.common.functions import FlatMapFunction,
ReduceFunction
from org.apache.flink.api.java.functions import KeySelector
from org.apache.flink.streaming.api.windowing.time.Time import milliseconds
# Added an extra import, this fails with an ImportError
import com.google.gson.GsonBuilder
class Generator(SourceFunction):
def __init__(self, num_iters):
self._running = True
self._num_iters = num_iters
# ... rest of the file is as in the documentation
This runs without any exceptions when run from IntelliJ (assuming
com.google.gson is added in the POM), but when I try to run it as a Flink
job with this command:
./pyflink-stream.sh ~/flink-python/MinimalExample.py - --local
it fails to find the dependency:
Starting execution of program
Failed to run plan: null
Traceback (most recent call last):
File "<string>", line 1, in <module>
File
"/var/folders/t1/gcltcjcn5zdgqfqrc32xk90x85xkg9/T/flink_streaming_plan_0bfab09c-baeb-414f-a718-01a5c71b3507/MinimalExample.py",
line 7, in <module>
ImportError: No module named google
How can I point pyflink-stream.sh to these Maven dependencies? I've tried
modifying the script to add my .m2/ directory to the classpath (using flink
run -C), but that didn't make any difference:
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
. "$bin"/config.sh
"$FLINK_BIN_DIR"/flink run -C "file:///Users/jmalt/.m2/" --class
org.apache.flink.streaming.python.api.PythonStreamBinder -v
"$FLINK_ROOT_DIR"/opt/flink-streaming-python*.jar "$@"
Thanks,
Joe Malt
Engineering Intern, Stream Processing
Yelp
Re: Running a Python streaming job with Java dependencies
Posted by Chesnay Schepler <ch...@apache.org>.
To use java classes not bundled with FLink you will have to place a jar
containing said classes into the /lib directory of the distribution.
On 25.07.2018 23:32, Joe Malt wrote:
> Hi,
>
> I'm trying to run a job with Flink's new Python streaming API but I'm
> running into issues with Java imports.
>
> I have a Jython project in IntelliJ with a lot of Java dependencies
> configured through Maven. I can't figure out how to make Flink "see"
> these dependencies.
>
> An example script that exhibits the problem is the following (it's the
> streaming example from the docs
> (https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/python.html#streaming-program-example)
> but with an extra import added)
>
> from org.apache.flink.streaming.api.functions.sourceimport SourceFunction
> from org.apache.flink.api.common.functionsimport FlatMapFunction, ReduceFunction
> from org.apache.flink.api.java.functionsimport KeySelector
> from org.apache.flink.streaming.api.windowing.time.Timeimport milliseconds
>
> # Added an extra import, this fails with an ImportError import com.google.gson.GsonBuilder
>
>
> class Generator(SourceFunction):
> def __init__(self, num_iters):
> self._running =True self._num_iters = num_iters# ... rest of the file is as in the documentation
>
> This runs without any exceptions when run from IntelliJ (assuming
> com.google.gson is added in the POM), but when I try to run it as a
> Flink job with this command:
>
> ./pyflink-stream.sh ~/flink-python/MinimalExample.py - --local
>
> it fails to find the dependency:
>
> Starting execution of program
> Failed to run plan: null
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> File
> "/var/folders/t1/gcltcjcn5zdgqfqrc32xk90x85xkg9/T/flink_streaming_plan_0bfab09c-baeb-414f-a718-01a5c71b3507/MinimalExample.py",
> line 7, in <module>
> ImportError: No module named google
>
> How can I point pyflink-stream.sh to these Maven dependencies? I've
> tried modifying the script to add my .m2/ directory to the classpath
> (using flink run -C), but that didn't make any difference:
>
> bin=`dirname "$0"`
> bin=`cd "$bin"; pwd`
>
> . "$bin"/config.sh
>
> "$FLINK_BIN_DIR"/flink run -C "file:///Users/jmalt/.m2/" --class
> org.apache.flink.streaming.python.api.PythonStreamBinder -v
> "$FLINK_ROOT_DIR"/opt/flink-streaming-python*.jar "$@"
>
>
> Thanks,
>
> Joe Malt
>
> Engineering Intern, Stream Processing
> Yelp