You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by amir bahmanyari <am...@yahoo.com> on 2016/05/09 19:17:33 UTC

Beam-Kafka in a Flink Cluster Integration Development Guideline

Hi Colleagues,I have been trying to build and run a Beam FlinkPipelineRunner app in a Flink Cluster for sometime.I followed the bottom procedure at the bottom of this link where it says how to run it in a Flink Cluster.dataArtisans/flink-dataflow

  
|  
|   
|   
|   |    |

   |

  |
|  
|    |  
dataArtisans/flink-dataflow
 flink-dataflow - Google Dataflow Runner for Apache Flink  |   |

  |

  |

 

My app gets builds ok, at least it seems like that.But, when I run it in the Flink Cluster, I have been getting zillions of different types of exceptions and its non ending.Is there a custom pom.xml, or a guideline to develop this app so it just builds and runs like the example in the above link?
This is how I run it in Flink Cluster:
./flink run /opt/analytics/apache/beamproject/incubator-beam-master/examples/java/src/main/java/org/apache/beam/examples/beamflink-bench/target/beamflink-bench-1.0-SNAPSHOT.jar --topic lrdata --bootstrap.servers myhost:9092 --zookeeper.connect kirk:2181 --group.id myGroup.

These are the dependencies I currently have in my pom.xml.Thanks so much for your help. Appreciate your kind attention.
<dependencies><dependency>        <groupId>com.google.oauth-client</groupId>        <artifactId>google-oauth-client</artifactId>        <version>1.22.0</version></dependency><dependency>      <groupId>org.apache.beam</groupId>      <artifactId>flink-runner_2.10</artifactId>     <version>0.1.0-incubating-SNAPSHOT</version>    </dependency><dependency>  <groupId>com.apache.kafka</groupId>  <artifactId>kafka-0.1.0</artifactId>  <version>0.1.0-incubating-SNAPSHOT</version></dependency><dependency>  <groupId>com.apache.beam</groupId>  <artifactId>direct-runner-0.1.0</artifactId>  <version>0.1.0-incubating-SNAPSHOT</version></dependency><dependency>        <groupId>com.google.collections</groupId>        <artifactId>google-collections</artifactId>        <version>1.0-rc2</version></dependency>
 <dependency>      <groupId>junit</groupId>      <artifactId>junit</artifactId>      <version>3.8.1</version>      <scope>test</scope>    </dependency>  </dependencies>

Re: Beam-Kafka in a Flink Cluster Integration Development Guideline

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
this is a very simple example POM that I threw together. It works to
compile the job and run it on a Flink cluster with Kafka.

There you go:

<!--

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements.  See the NOTICE file

distributed with this work for additional information

regarding copyright ownership.  The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License.  You may obtain a copy of the License at


  http://www.apache.org/licenses/LICENSE-2.0


Unless required by applicable law or agreed to in writing,

software distributed under the License is distributed on an

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

KIND, either express or implied.  See the License for the

specific language governing permissions and limitations

under the License.

-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<groupId>beamtest</groupId>

<artifactId>beamtest</artifactId>

<version>1.0-SNAPSHOT</version>

<packaging>jar</packaging>


<name>Flink Quickstart Job</name>

<url>http://www.myorganization.org</url>


<properties>

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

<flink.version>1.0.0</flink.version>

</properties>


<repositories>

<repository>

<id>apache.snapshots</id>

<name>Apache Development Snapshot Repository</name>

<url>https://repository.apache.org/content/repositories/snapshots/</url>

<releases>

<enabled>false</enabled>

</releases>

<snapshots>

<enabled>true</enabled>

</snapshots>

</repository>

</repositories>

<!--

Execute "mvn clean package -Pbuild-jar"

to build a fat jar file out of this project!


-->


<dependencies>

<dependency>

<groupId>org.apache.beam</groupId>

<artifactId>java-sdk-all</artifactId>

<version>0.1.0-incubating-SNAPSHOT</version>

</dependency>


<dependency>

<groupId>org.apache.beam</groupId>

<artifactId>kafka</artifactId>

<version>0.1.0-incubating-SNAPSHOT</version>

</dependency>


<dependency>

<groupId>org.apache.beam</groupId>

<artifactId>flink-runner_2.10</artifactId>

<version>0.1.0-incubating-SNAPSHOT</version>

</dependency>

</dependencies>


<profiles>

<profile>

<!-- Profile for packaging correct JAR files -->

<id>build-jar</id>

<activation>

<activeByDefault>false</activeByDefault>

</activation>


<build>

<plugins>

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-shade-plugin</artifactId>

<version>2.4.1</version>

<executions>

<execution>

<phase>package</phase>

<goals>

<goal>shade</goal>

</goals>

</execution>

</executions>

</plugin>

</plugins>

</build>

</profile>

</profiles>


</project>

Cheers,
Aljoscha

On Tue, 10 May 2016 at 16:03 Maximilian Michels <mx...@apache.org> wrote:

> Hi Amir,
>
> We need more information to tell you what the problem is. One of the
> issues about the example pom of the old flink-dataflow runner is that
> it filters out all "org.apache.flink" classes. Some of these classes
> are actually not provided by the Flink cluster environment, e.g.
> connectors like Kafka. So you could try to remove the Shade plugin.
> This will increase your Jar size. Another problem people run often
> into is that they use different versions of Flink, e.g. two versions
> of the Scala library. That does not work.
>
> Could you please provide a stack trace? Otherwise we have to guess
> what the problem is :)
>
> Thanks,
> Max
>
> On Mon, May 9, 2016 at 9:17 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> > Hi Colleagues,
> > I have been trying to build and run a Beam FlinkPipelineRunner app in a
> > Flink Cluster for sometime.
> > I followed the bottom procedure at the bottom of this link where it says
> how
> > to run it in a Flink Cluster.
> > dataArtisans/flink-dataflow
> >
> > dataArtisans/flink-dataflow
> >
> > flink-dataflow - Google Dataflow Runner for Apache Flink
> >
> >
> > My app gets builds ok, at least it seems like that.
> > But, when I run it in the Flink Cluster, I have been getting zillions of
> > different types of exceptions and its non ending.
> > Is there a custom pom.xml, or a guideline to develop this app so it just
> > builds and runs like the example in the above link?
> >
> > This is how I run it in Flink Cluster:
> >
> > ./flink run
> >
> /opt/analytics/apache/beamproject/incubator-beam-master/examples/java/src/main/java/org/apache/beam/examples/beamflink-bench/target/beamflink-bench-1.0-SNAPSHOT.jar
> > --topic lrdata --bootstrap.servers myhost:9092 --zookeeper.connect
> kirk:2181
> > --group.id myGroup.
> >
> > These are the dependencies I currently have in my pom.xml.
> > Thanks so much for your help. Appreciate your kind attention.
> >
> > <dependencies>
> > <dependency>
> >         <groupId>com.google.oauth-client</groupId>
> >         <artifactId>google-oauth-client</artifactId>
> >         <version>1.22.0</version>
> > </dependency>
> > <dependency>
> >       <groupId>org.apache.beam</groupId>
> >       <artifactId>flink-runner_2.10</artifactId>
> >      <version>0.1.0-incubating-SNAPSHOT</version>
> >     </dependency>
> > <dependency>
> >   <groupId>com.apache.kafka</groupId>
> >   <artifactId>kafka-0.1.0</artifactId>
> >   <version>0.1.0-incubating-SNAPSHOT</version>
> > </dependency>
> > <dependency>
> >   <groupId>com.apache.beam</groupId>
> >   <artifactId>direct-runner-0.1.0</artifactId>
> >   <version>0.1.0-incubating-SNAPSHOT</version>
> > </dependency>
> > <dependency>
> >         <groupId>com.google.collections</groupId>
> >         <artifactId>google-collections</artifactId>
> >         <version>1.0-rc2</version>
> > </dependency>
> >
> >  <dependency>
> >       <groupId>junit</groupId>
> >       <artifactId>junit</artifactId>
> >       <version>3.8.1</version>
> >       <scope>test</scope>
> >     </dependency>
> >   </dependencies>
> >
>

Re: Beam-Kafka in a Flink Cluster Integration Development Guideline

Posted by Maximilian Michels <mx...@apache.org>.
Hi Amir,

We need more information to tell you what the problem is. One of the
issues about the example pom of the old flink-dataflow runner is that
it filters out all "org.apache.flink" classes. Some of these classes
are actually not provided by the Flink cluster environment, e.g.
connectors like Kafka. So you could try to remove the Shade plugin.
This will increase your Jar size. Another problem people run often
into is that they use different versions of Flink, e.g. two versions
of the Scala library. That does not work.

Could you please provide a stack trace? Otherwise we have to guess
what the problem is :)

Thanks,
Max

On Mon, May 9, 2016 at 9:17 PM, amir bahmanyari <am...@yahoo.com> wrote:
> Hi Colleagues,
> I have been trying to build and run a Beam FlinkPipelineRunner app in a
> Flink Cluster for sometime.
> I followed the bottom procedure at the bottom of this link where it says how
> to run it in a Flink Cluster.
> dataArtisans/flink-dataflow
>
> dataArtisans/flink-dataflow
>
> flink-dataflow - Google Dataflow Runner for Apache Flink
>
>
> My app gets builds ok, at least it seems like that.
> But, when I run it in the Flink Cluster, I have been getting zillions of
> different types of exceptions and its non ending.
> Is there a custom pom.xml, or a guideline to develop this app so it just
> builds and runs like the example in the above link?
>
> This is how I run it in Flink Cluster:
>
> ./flink run
> /opt/analytics/apache/beamproject/incubator-beam-master/examples/java/src/main/java/org/apache/beam/examples/beamflink-bench/target/beamflink-bench-1.0-SNAPSHOT.jar
> --topic lrdata --bootstrap.servers myhost:9092 --zookeeper.connect kirk:2181
> --group.id myGroup.
>
> These are the dependencies I currently have in my pom.xml.
> Thanks so much for your help. Appreciate your kind attention.
>
> <dependencies>
> <dependency>
>         <groupId>com.google.oauth-client</groupId>
>         <artifactId>google-oauth-client</artifactId>
>         <version>1.22.0</version>
> </dependency>
> <dependency>
>       <groupId>org.apache.beam</groupId>
>       <artifactId>flink-runner_2.10</artifactId>
>      <version>0.1.0-incubating-SNAPSHOT</version>
>     </dependency>
> <dependency>
>   <groupId>com.apache.kafka</groupId>
>   <artifactId>kafka-0.1.0</artifactId>
>   <version>0.1.0-incubating-SNAPSHOT</version>
> </dependency>
> <dependency>
>   <groupId>com.apache.beam</groupId>
>   <artifactId>direct-runner-0.1.0</artifactId>
>   <version>0.1.0-incubating-SNAPSHOT</version>
> </dependency>
> <dependency>
>         <groupId>com.google.collections</groupId>
>         <artifactId>google-collections</artifactId>
>         <version>1.0-rc2</version>
> </dependency>
>
>  <dependency>
>       <groupId>junit</groupId>
>       <artifactId>junit</artifactId>
>       <version>3.8.1</version>
>       <scope>test</scope>
>     </dependency>
>   </dependencies>
>