You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Joe Malt <jm...@yelp.com> on 2018/07/25 21:32:15 UTC

Running a Python streaming job with Java dependencies

Hi,

I'm trying to run a job with Flink's new Python streaming API but I'm
running into issues with Java imports.

I have a Jython project in IntelliJ with a lot of Java dependencies
configured through Maven. I can't figure out how to make Flink "see" these
dependencies.

An example script that exhibits the problem is the following (it's the
streaming example from the docs (
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/python.html#streaming-program-example)
but with an extra import added)

from org.apache.flink.streaming.api.functions.source import SourceFunction
from org.apache.flink.api.common.functions import FlatMapFunction,
ReduceFunction
from org.apache.flink.api.java.functions import KeySelector
from org.apache.flink.streaming.api.windowing.time.Time import milliseconds

# Added an extra import, this fails with an ImportError
import com.google.gson.GsonBuilder


class Generator(SourceFunction):
    def __init__(self, num_iters):
        self._running = True
        self._num_iters = num_iters

# ... rest of the file is as in the documentation


This runs without any exceptions when run from IntelliJ (assuming
com.google.gson is added in the POM), but when I try to run it as a Flink
job with this command:

./pyflink-stream.sh ~/flink-python/MinimalExample.py - --local

it fails to find the dependency:

Starting execution of program
Failed to run plan: null
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File
"/var/folders/t1/gcltcjcn5zdgqfqrc32xk90x85xkg9/T/flink_streaming_plan_0bfab09c-baeb-414f-a718-01a5c71b3507/MinimalExample.py",
line 7, in <module>
ImportError: No module named google

How can I point pyflink-stream.sh to these Maven dependencies? I've tried
modifying the script to add my .m2/ directory to the classpath (using flink
run -C), but that didn't make any difference:

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

. "$bin"/config.sh

"$FLINK_BIN_DIR"/flink run -C "file:///Users/jmalt/.m2/" --class
org.apache.flink.streaming.python.api.PythonStreamBinder -v
"$FLINK_ROOT_DIR"/opt/flink-streaming-python*.jar "$@"


Thanks,

Joe Malt

Engineering Intern, Stream Processing
Yelp

Re: Running a Python streaming job with Java dependencies

Posted by Chesnay Schepler <ch...@apache.org>.
To use java classes not bundled with FLink you will have to place a jar 
containing said classes into the /lib directory of the distribution.

On 25.07.2018 23:32, Joe Malt wrote:
> Hi,
>
> I'm trying to run a job with Flink's new Python streaming API but I'm 
> running into issues with Java imports.
>
> I have a Jython project in IntelliJ with a lot of Java dependencies 
> configured through Maven. I can't figure out how to make Flink "see" 
> these dependencies.
>
> An example script that exhibits the problem is the following (it's the 
> streaming example from the docs 
> (https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/python.html#streaming-program-example) 
> but with an extra import added)
>
> from org.apache.flink.streaming.api.functions.sourceimport SourceFunction
> from org.apache.flink.api.common.functionsimport FlatMapFunction, ReduceFunction
> from org.apache.flink.api.java.functionsimport KeySelector
> from org.apache.flink.streaming.api.windowing.time.Timeimport milliseconds
>
> # Added an extra import, this fails with an ImportError import com.google.gson.GsonBuilder
>
>
> class Generator(SourceFunction):
>      def __init__(self, num_iters):
>          self._running =True self._num_iters = num_iters# ... rest of the file is as in the documentation
>
> This runs without any exceptions when run from IntelliJ (assuming 
> com.google.gson is added in the POM), but when I try to run it as a 
> Flink job with this command:
>
> ./pyflink-stream.sh ~/flink-python/MinimalExample.py - --local
>
> it fails to find the dependency:
>
> Starting execution of program
> Failed to run plan: null
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File 
> "/var/folders/t1/gcltcjcn5zdgqfqrc32xk90x85xkg9/T/flink_streaming_plan_0bfab09c-baeb-414f-a718-01a5c71b3507/MinimalExample.py", 
> line 7, in <module>
> ImportError: No module named google
>
> How can I point pyflink-stream.sh to these Maven dependencies? I've 
> tried modifying the script to add my .m2/ directory to the classpath 
> (using flink run -C), but that didn't make any difference:
>
> bin=`dirname "$0"`
> bin=`cd "$bin"; pwd`
>
> . "$bin"/config.sh
>
> "$FLINK_BIN_DIR"/flink run -C "file:///Users/jmalt/.m2/" --class 
> org.apache.flink.streaming.python.api.PythonStreamBinder -v 
> "$FLINK_ROOT_DIR"/opt/flink-streaming-python*.jar "$@"
>
>
> Thanks,
>
> Joe Malt
>
> Engineering Intern, Stream Processing
> Yelp