You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Prez Cannady <re...@opencorrelate.org> on 2016/06/16 21:51:18 UTC

Moving from single-node, Maven examples to cluster execution

Having a hard time trying to get my head around how to deploy my Flink programs to a pre-configured, remote Flink cluster setup.

My Mavenized setup uses Spring Boot (to simplify class path handling and generate pretty logs) to execute provision a StreamExecutionEnvironment with Kafka sources and sinks. I can also run this quite effective the standard way (`java -jar …`).  What I’m unclear on is how I might go about distributing this code to run on an existing Flink cluster setup.  Where do I drop the jars? Do I need to restart Flink to do so?

class AppRunner extends CommandLineRunner {

    val log = LoggerFactory.getLogger(classOf[AppRunner])


    override def run(args: String*): Unit = {

        val env : StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    val consumer = …
    val producer = ...
        val stream = env.addSource(consumer)

        stream
            …
        // Do some stuff
        …
            .addSink(producer)

        env.execute
    }



}
…

@SpringBootApplication
object App {


    @throws(classOf[Exception])
    def main( args: Array[String] ) : Unit = {
        SpringApplication.run(classOf[AppRunner], args: _*)
    }
}


Try as I might, I couldn’t find any clear instructions on how to do this in the documentation.  The cluster documentation ends with starting it.

https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink <https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink>

The Wikiedits example doesn’t involve any third party dependencies, so I’m not clear on how to manage class path for it.

https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html <https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html>

Any help in getting me on the right, preferably best practices path would be appreciated.


Prez Cannady  
p: 617 500 3378  
e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>

Re: Moving from single-node, Maven examples to cluster execution

Posted by Prez Cannady <re...@correlatesystems.com>.

All right, I figured I’d have to do shading, but hadn’t gotten around to experimenting.

I’ll try it out.

Prez Cannady  
p: 617 500 3378  
e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>  









> On Jun 16, 2016, at 6:19 PM, Josh <jo...@gmail.com> wrote:
> 
> Hi Prez,
> 
> You need to build a jar with all your dependencies bundled inside. With maven you can use maven-assembly-plugin for this, or with SBT there's sbt-assembly.
> 
> Once you've done this, you can login to the JobManager node of your Flink cluster, copy the jar across and use the Flink command line tool to submit jobs to the running cluster, e.g. (from the Flink root directory):
> 
> ./bin/flink run -c my.application.MainClass /path/to/YourApp.jar
> 
> See https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html <https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html>
> You can also run the Flink command line tool locally and submit the jar to a remote JobManager with the -m flag. Although I don't do this at the moment because it doesn't work so easily if you're running Flink on AWS/EMR.
> 
> Josh
> 
> 
> On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady <revprez@opencorrelate.org <ma...@opencorrelate.org>> wrote:
> Having a hard time trying to get my head around how to deploy my Flink programs to a pre-configured, remote Flink cluster setup.
> 
> My Mavenized setup uses Spring Boot (to simplify class path handling and generate pretty logs) to execute provision a StreamExecutionEnvironment with Kafka sources and sinks. I can also run this quite effective the standard way (`java -jar …`).  What I’m unclear on is how I might go about distributing this code to run on an existing Flink cluster setup.  Where do I drop the jars? Do I need to restart Flink to do so?
> 
> class AppRunner extends CommandLineRunner {
> 
>     val log = LoggerFactory.getLogger(classOf[AppRunner])
> 
> 
>     override def run(args: String*): Unit = {
> 
>         val env : StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
> 
>     val consumer = …
>     val producer = ...
>         val stream = env.addSource(consumer)
> 
>         stream
>             …
>         // Do some stuff
>         …
>             .addSink(producer)
> 
>         env.execute
>     }
> 
> 
> 
> }
> …
> 
> @SpringBootApplication
> object App {
> 
> 
>     @throws(classOf[Exception])
>     def main( args: Array[String] ) : Unit = {
>         SpringApplication.run(classOf[AppRunner], args: _*)
>     }
> }
> 
> 
> Try as I might, I couldn’t find any clear instructions on how to do this in the documentation.  The cluster documentation ends with starting it.
> 
> https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink <https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink>
> 
> The Wikiedits example doesn’t involve any third party dependencies, so I’m not clear on how to manage class path for it.
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html <https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html>
> 
> Any help in getting me on the right, preferably best practices path would be appreciated.
> 
> 
> Prez Cannady  
> p: 617 500 3378 <tel:617%20500%203378>  
> e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
> GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
> LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Moving from single-node, Maven examples to cluster execution

Posted by Josh <jo...@gmail.com>.

Hi Prez,

You need to build a jar with all your dependencies bundled inside. With
maven you can use maven-assembly-plugin for this, or with SBT there's
sbt-assembly.

Once you've done this, you can login to the JobManager node of your Flink
cluster, copy the jar across and use the Flink command line tool to submit
jobs to the running cluster, e.g. (from the Flink root directory):

./bin/flink run -c my.application.MainClass /path/to/YourApp.jar

See https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html

You can also run the Flink command line tool locally and submit the jar to
a remote JobManager with the -m flag. Although I don't do this at the
moment because it doesn't work so easily if you're running Flink on AWS/EMR.

Josh

On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady <re...@opencorrelate.org>
wrote:

> Having a hard time trying to get my head around how to deploy my Flink
> programs to a pre-configured, remote Flink cluster setup.
>
> My Mavenized setup uses Spring Boot (to simplify class path handling and
> generate pretty logs) to execute provision a StreamExecutionEnvironment
> with Kafka sources and sinks. I can also run this quite effective the
> standard way (`java -jar …`).  What I’m unclear on is how I might go about
> distributing this code to run on an existing Flink cluster setup.  Where do
> I drop the jars? Do I need to restart Flink to do so?
>
> class AppRunner extends CommandLineRunner {
>
>     val log = LoggerFactory.getLogger(classOf[AppRunner])
>
>
>     override def run(args: String*): Unit = {
>
>         val env : StreamExecutionEnvironment =
> StreamExecutionEnvironment.getExecutionEnvironment
>
>     val consumer = …
>     val producer = ...
>         val stream = env.addSource(consumer)
>
>         stream
>             …
>         // Do some stuff
>         …
>             .addSink(producer)
>
>         env.execute
>     }
>
>
>
> }
> …
>
> @SpringBootApplication
> object App {
>
>
>     @throws(classOf[Exception])
>     def main( args: Array[String] ) : Unit = {
>         SpringApplication.run(classOf[AppRunner], args: _*)
>     }
> }
>
>
> Try as I might, I couldn’t find any clear instructions on how to do this
> in the documentation.  The cluster documentation ends with starting it.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink
>
> The Wikiedits example doesn’t involve any third party dependencies, so I’m
> not clear on how to manage class path for it.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html
>
> Any help in getting me on the right, preferably best practices path would
> be appreciated.
>
>
> Prez Cannady
> p: 617 500 3378
> e: revprez@opencorrelate.org
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>