You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Felix Garcia Borrego <fb...@gilt.com> on 2014/09/05 12:37:59 UTC

New sbt plugin to deploy jobs to EC2

As far as I know in other to deploy and execute jobs in EC2 you need to
assembly you project, copy your jar into the cluster, log into using ssh
and submit the job.

To avoid having to do this I've been prototyping an sbt plugin(1) that
allows to create and send Spark jobs to an Amazon EC2 cluster directly from
your local machine using sbt.

It's a simple plugin that actually rely on spark-ec2 and spark-submit, but
 I'd like to have feedback and see if this plugin makes any sense before
going ahead with the final impl or if there is any other easy way to do so.

(1) https://github.com/felixgborrego/sbt-spark-ec2-plugin

Thanks,

Re: New sbt plugin to deploy jobs to EC2

Posted by Felix Garcia Borrego <fb...@gilt.com>.

Hi Shafaq,

Sorry for the delay, I've created an example project to show how to declare
the dependencies to the plugin.
https://github.com/felixgborrego/sbt-spark-ec2-plugin/tree/master/example-spark-ec2.
You are right there was an issue with the resolver for the ssh lib. I've
also updated the plugin to work with Spark 1.1.0.

Thanks,



On Wed, Sep 24, 2014 at 8:35 PM, Shafaq <s....@gmail.com> wrote:

> Hi,
>
>   testing out the Spark Ec2 deployment plugin:
>
> I try to compile using
>
>  $sbt sparkLaunchCluster
>
>
> -----------------------------------------------------------------------------------------------
> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
> [warn]     ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]     ::          UNRESOLVED DEPENDENCIES         ::
> [warn]     ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]     :: fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
> [warn]     ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]
> [warn]     Note: Unresolved dependencies path:
> [warn]         fr.janalyse:janalyse-ssh_2.10:0.9.13
> [warn]           +- com.gilt:lib-spark-manager_2.10:0.0.3.9
> [warn]           +- com.gilt:sbt-spark-ec2-plugin:0.1.5 (sbtVersion=0.13,
> scalaVersion=2.10)
> (/Users/saq/Work/spark-datapipeline/project/plugins.sbt#L6-7)
> [warn]           +- default:spark-datapipeline-build:0.1-SNAPSHOT
> (sbtVersion=0.13, scalaVersion=2.10)
> sbt.ResolveException: unresolved dependency:
> fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
>     at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:243)
>     at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:158)
>     at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:156)
>     at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
>     at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
>     at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:124)
>     at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
>     at sbt.IvySbt$$anon$3.call(Ivy.scala:64)
>     at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>     at
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>     at
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>     at xsbt.boot.Using$.withResource(Using.scala:10)
>     at xsbt.boot.Using$.apply(Using.scala:9)
>     at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>     at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>     at xsbt.boot.Locks$.apply0(Locks.scala:31)
>     at xsbt.boot.Locks$.apply(Locks.scala:28)
>     at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
>     at sbt.IvySbt.withIvy(Ivy.scala:119)
>     at sbt.IvySbt.withIvy(Ivy.scala:116)
>     at sbt.IvySbt$Module.withModule(Ivy.scala:147)
>
> ------------------------------------------------------
>
> I have to use scala version 2.10 as spark is using it and sbt version is
> 0.13
>
> My sbt looks as follows:
>
> name := "scala-datapipeline"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
>
> scalacOptions ++= Seq( "-deprecation", "-unchecked", "-feature")
>
>
>
> sparkec2.Ec2SparkPluginSettings.sparkSettings
>
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>
> resolvers += "spray" at "http://repo.spray.io/"
>
>
> resolvers += Resolver.url(
>   "bintray Repository",
>   url("http://dl.bintray.com/felixgborrego/repo"))(
>     Resolver.ivyStylePatterns)
>
> resolvers += "JAnalyse Repository" at "http://www.janalyse.fr/repository/"
>
>
>
>
>
> libraryDependencies ++= Seq(
>     "org.apache.spark" %% "spark-core" % "1.1.0",    // 1.0.2
>     "org.apache.spark" %% "spark-sql"  % "1.1.0",
>     "org.apache.spark" %% "spark-hive"  % "1.1.0",
>     "com.github.nscala-time" %% "nscala-time" % "1.0.0",
>     "org.json4s" %% "json4s-native" % "3.2.10",
>     "com.codahale" %% "jerkson_2.9.1" % "0.5.0",
>     "fr.janalyse" % "janalyse-ssh" % "0.9.10"
>
> )
>
>
>
>
>
>
>
>
>
>
> On Fri, Sep 5, 2014 at 4:08 AM, andy petrella <an...@gmail.com>
> wrote:
>
>> \o/ => will test it soon or sooner, gr8 idea btw
>>
>> aℕdy ℙetrella
>> about.me/noootsab
>> [image: aℕdy ℙetrella on about.me]
>>
>> <http://about.me/noootsab>
>>
>>
>> On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
>> wrote:
>>
>>> As far as I know in other to deploy and execute jobs in EC2 you need to
>>> assembly you project, copy your jar into the cluster, log into using ssh
>>> and submit the job.
>>>
>>> To avoid having to do this I've been prototyping an sbt plugin(1) that
>>> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
>>> your local machine using sbt.
>>>
>>> It's a simple plugin that actually rely on spark-ec2 and spark-submit,
>>> but  I'd like to have feedback and see if this plugin makes any sense
>>> before going ahead with the final impl or if there is any other easy way to
>>> do so.
>>>
>>> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>>>
>>> Thanks,
>>>
>>>
>>>
>>
>
>
> --
> Kind Regards,
> Shafaq
>
>

Re: New sbt plugin to deploy jobs to EC2

Posted by Shafaq <s....@gmail.com>.

Hi,

  testing out the Spark Ec2 deployment plugin:

I try to compile using

 $sbt sparkLaunchCluster

-----------------------------------------------------------------------------------------------
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     ::          UNRESOLVED DEPENDENCIES         ::
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     :: fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn]     Note: Unresolved dependencies path:
[warn]         fr.janalyse:janalyse-ssh_2.10:0.9.13
[warn]           +- com.gilt:lib-spark-manager_2.10:0.0.3.9
[warn]           +- com.gilt:sbt-spark-ec2-plugin:0.1.5 (sbtVersion=0.13,
scalaVersion=2.10)
(/Users/saq/Work/spark-datapipeline/project/plugins.sbt#L6-7)
[warn]           +- default:spark-datapipeline-build:0.1-SNAPSHOT
(sbtVersion=0.13, scalaVersion=2.10)
sbt.ResolveException: unresolved dependency:
fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
    at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:243)
    at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:158)
    at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:156)
    at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
    at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
    at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:124)
    at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
    at sbt.IvySbt$$anon$3.call(Ivy.scala:64)
    at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
    at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
    at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
    at xsbt.boot.Using$.withResource(Using.scala:10)
    at xsbt.boot.Using$.apply(Using.scala:9)
    at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
    at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
    at xsbt.boot.Locks$.apply0(Locks.scala:31)
    at xsbt.boot.Locks$.apply(Locks.scala:28)
    at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
    at sbt.IvySbt.withIvy(Ivy.scala:119)
    at sbt.IvySbt.withIvy(Ivy.scala:116)
    at sbt.IvySbt$Module.withModule(Ivy.scala:147)

------------------------------------------------------

I have to use scala version 2.10 as spark is using it and sbt version is
0.13

My sbt looks as follows:

name := "scala-datapipeline"

version := "1.0"

scalaVersion := "2.10.4"


scalacOptions ++= Seq( "-deprecation", "-unchecked", "-feature")



sparkec2.Ec2SparkPluginSettings.sparkSettings


resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

resolvers += "spray" at "http://repo.spray.io/"


resolvers += Resolver.url(
  "bintray Repository",
  url("http://dl.bintray.com/felixgborrego/repo"))(
    Resolver.ivyStylePatterns)

resolvers += "JAnalyse Repository" at "http://www.janalyse.fr/repository/"





libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.1.0",    // 1.0.2
    "org.apache.spark" %% "spark-sql"  % "1.1.0",
    "org.apache.spark" %% "spark-hive"  % "1.1.0",
    "com.github.nscala-time" %% "nscala-time" % "1.0.0",
    "org.json4s" %% "json4s-native" % "3.2.10",
    "com.codahale" %% "jerkson_2.9.1" % "0.5.0",
    "fr.janalyse" % "janalyse-ssh" % "0.9.10"

)










On Fri, Sep 5, 2014 at 4:08 AM, andy petrella <an...@gmail.com>
wrote:

> \o/ => will test it soon or sooner, gr8 idea btw
>
> aℕdy ℙetrella
> about.me/noootsab
> [image: aℕdy ℙetrella on about.me]
>
> <http://about.me/noootsab>
>
>
> On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
> wrote:
>
>> As far as I know in other to deploy and execute jobs in EC2 you need to
>> assembly you project, copy your jar into the cluster, log into using ssh
>> and submit the job.
>>
>> To avoid having to do this I've been prototyping an sbt plugin(1) that
>> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
>> your local machine using sbt.
>>
>> It's a simple plugin that actually rely on spark-ec2 and spark-submit,
>> but  I'd like to have feedback and see if this plugin makes any sense
>> before going ahead with the final impl or if there is any other easy way to
>> do so.
>>
>> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>>
>> Thanks,
>>
>>
>>
>


-- 
Kind Regards,
Shafaq

Re: New sbt plugin to deploy jobs to EC2

Posted by andy petrella <an...@gmail.com>.

\o/ => will test it soon or sooner, gr8 idea btw

aℕdy ℙetrella
about.me/noootsab
[image: aℕdy ℙetrella on about.me]

<http://about.me/noootsab>


On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
wrote:

> As far as I know in other to deploy and execute jobs in EC2 you need to
> assembly you project, copy your jar into the cluster, log into using ssh
> and submit the job.
>
> To avoid having to do this I've been prototyping an sbt plugin(1) that
> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
> your local machine using sbt.
>
> It's a simple plugin that actually rely on spark-ec2 and spark-submit, but
>  I'd like to have feedback and see if this plugin makes any sense before
> going ahead with the final impl or if there is any other easy way to do so.
>
> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>
> Thanks,
>
>
>