You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Felix Garcia Borrego <fb...@gilt.com> on 2014/09/05 12:37:59 UTC
New sbt plugin to deploy jobs to EC2
As far as I know in other to deploy and execute jobs in EC2 you need to
assembly you project, copy your jar into the cluster, log into using ssh
and submit the job.
To avoid having to do this I've been prototyping an sbt plugin(1) that
allows to create and send Spark jobs to an Amazon EC2 cluster directly from
your local machine using sbt.
It's a simple plugin that actually rely on spark-ec2 and spark-submit, but
I'd like to have feedback and see if this plugin makes any sense before
going ahead with the final impl or if there is any other easy way to do so.
(1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
Thanks,
Re: New sbt plugin to deploy jobs to EC2
Posted by Felix Garcia Borrego <fb...@gilt.com>.
Hi Shafaq,
Sorry for the delay, I've created an example project to show how to declare
the dependencies to the plugin.
https://github.com/felixgborrego/sbt-spark-ec2-plugin/tree/master/example-spark-ec2.
You are right there was an issue with the resolver for the ssh lib. I've
also updated the plugin to work with Spark 1.1.0.
Thanks,
On Wed, Sep 24, 2014 at 8:35 PM, Shafaq <s....@gmail.com> wrote:
> Hi,
>
> testing out the Spark Ec2 deployment plugin:
>
> I try to compile using
>
> $sbt sparkLaunchCluster
>
>
> -----------------------------------------------------------------------------------------------
> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
> [warn] ::::::::::::::::::::::::::::::::::::::::::::::
> [warn] :: UNRESOLVED DEPENDENCIES ::
> [warn] ::::::::::::::::::::::::::::::::::::::::::::::
> [warn] :: fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
> [warn] ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]
> [warn] Note: Unresolved dependencies path:
> [warn] fr.janalyse:janalyse-ssh_2.10:0.9.13
> [warn] +- com.gilt:lib-spark-manager_2.10:0.0.3.9
> [warn] +- com.gilt:sbt-spark-ec2-plugin:0.1.5 (sbtVersion=0.13,
> scalaVersion=2.10)
> (/Users/saq/Work/spark-datapipeline/project/plugins.sbt#L6-7)
> [warn] +- default:spark-datapipeline-build:0.1-SNAPSHOT
> (sbtVersion=0.13, scalaVersion=2.10)
> sbt.ResolveException: unresolved dependency:
> fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
> at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:243)
> at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:158)
> at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:156)
> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
> at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:124)
> at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
> at sbt.IvySbt$$anon$3.call(Ivy.scala:64)
> at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
> at
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
> at
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
> at xsbt.boot.Using$.withResource(Using.scala:10)
> at xsbt.boot.Using$.apply(Using.scala:9)
> at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
> at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
> at xsbt.boot.Locks$.apply0(Locks.scala:31)
> at xsbt.boot.Locks$.apply(Locks.scala:28)
> at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
> at sbt.IvySbt.withIvy(Ivy.scala:119)
> at sbt.IvySbt.withIvy(Ivy.scala:116)
> at sbt.IvySbt$Module.withModule(Ivy.scala:147)
>
> ------------------------------------------------------
>
> I have to use scala version 2.10 as spark is using it and sbt version is
> 0.13
>
> My sbt looks as follows:
>
> name := "scala-datapipeline"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
>
> scalacOptions ++= Seq( "-deprecation", "-unchecked", "-feature")
>
>
>
> sparkec2.Ec2SparkPluginSettings.sparkSettings
>
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>
> resolvers += "spray" at "http://repo.spray.io/"
>
>
> resolvers += Resolver.url(
> "bintray Repository",
> url("http://dl.bintray.com/felixgborrego/repo"))(
> Resolver.ivyStylePatterns)
>
> resolvers += "JAnalyse Repository" at "http://www.janalyse.fr/repository/"
>
>
>
>
>
> libraryDependencies ++= Seq(
> "org.apache.spark" %% "spark-core" % "1.1.0", // 1.0.2
> "org.apache.spark" %% "spark-sql" % "1.1.0",
> "org.apache.spark" %% "spark-hive" % "1.1.0",
> "com.github.nscala-time" %% "nscala-time" % "1.0.0",
> "org.json4s" %% "json4s-native" % "3.2.10",
> "com.codahale" %% "jerkson_2.9.1" % "0.5.0",
> "fr.janalyse" % "janalyse-ssh" % "0.9.10"
>
> )
>
>
>
>
>
>
>
>
>
>
> On Fri, Sep 5, 2014 at 4:08 AM, andy petrella <an...@gmail.com>
> wrote:
>
>> \o/ => will test it soon or sooner, gr8 idea btw
>>
>> aℕdy ℙetrella
>> about.me/noootsab
>> [image: aℕdy ℙetrella on about.me]
>>
>> <http://about.me/noootsab>
>>
>>
>> On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
>> wrote:
>>
>>> As far as I know in other to deploy and execute jobs in EC2 you need to
>>> assembly you project, copy your jar into the cluster, log into using ssh
>>> and submit the job.
>>>
>>> To avoid having to do this I've been prototyping an sbt plugin(1) that
>>> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
>>> your local machine using sbt.
>>>
>>> It's a simple plugin that actually rely on spark-ec2 and spark-submit,
>>> but I'd like to have feedback and see if this plugin makes any sense
>>> before going ahead with the final impl or if there is any other easy way to
>>> do so.
>>>
>>> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>>>
>>> Thanks,
>>>
>>>
>>>
>>
>
>
> --
> Kind Regards,
> Shafaq
>
>
Re: New sbt plugin to deploy jobs to EC2
Posted by Shafaq <s....@gmail.com>.
Hi,
testing out the Spark Ec2 deployment plugin:
I try to compile using
$sbt sparkLaunchCluster
-----------------------------------------------------------------------------------------------
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] fr.janalyse:janalyse-ssh_2.10:0.9.13
[warn] +- com.gilt:lib-spark-manager_2.10:0.0.3.9
[warn] +- com.gilt:sbt-spark-ec2-plugin:0.1.5 (sbtVersion=0.13,
scalaVersion=2.10)
(/Users/saq/Work/spark-datapipeline/project/plugins.sbt#L6-7)
[warn] +- default:spark-datapipeline-build:0.1-SNAPSHOT
(sbtVersion=0.13, scalaVersion=2.10)
sbt.ResolveException: unresolved dependency:
fr.janalyse#janalyse-ssh_2.10;0.9.13: not found
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:243)
at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:158)
at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:156)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:147)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:124)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
at sbt.IvySbt$$anon$3.call(Ivy.scala:64)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
at xsbt.boot.Using$.withResource(Using.scala:10)
at xsbt.boot.Using$.apply(Using.scala:9)
at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
at sbt.IvySbt.withIvy(Ivy.scala:119)
at sbt.IvySbt.withIvy(Ivy.scala:116)
at sbt.IvySbt$Module.withModule(Ivy.scala:147)
------------------------------------------------------
I have to use scala version 2.10 as spark is using it and sbt version is
0.13
My sbt looks as follows:
name := "scala-datapipeline"
version := "1.0"
scalaVersion := "2.10.4"
scalacOptions ++= Seq( "-deprecation", "-unchecked", "-feature")
sparkec2.Ec2SparkPluginSettings.sparkSettings
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resolvers += "spray" at "http://repo.spray.io/"
resolvers += Resolver.url(
"bintray Repository",
url("http://dl.bintray.com/felixgborrego/repo"))(
Resolver.ivyStylePatterns)
resolvers += "JAnalyse Repository" at "http://www.janalyse.fr/repository/"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.1.0", // 1.0.2
"org.apache.spark" %% "spark-sql" % "1.1.0",
"org.apache.spark" %% "spark-hive" % "1.1.0",
"com.github.nscala-time" %% "nscala-time" % "1.0.0",
"org.json4s" %% "json4s-native" % "3.2.10",
"com.codahale" %% "jerkson_2.9.1" % "0.5.0",
"fr.janalyse" % "janalyse-ssh" % "0.9.10"
)
On Fri, Sep 5, 2014 at 4:08 AM, andy petrella <an...@gmail.com>
wrote:
> \o/ => will test it soon or sooner, gr8 idea btw
>
> aℕdy ℙetrella
> about.me/noootsab
> [image: aℕdy ℙetrella on about.me]
>
> <http://about.me/noootsab>
>
>
> On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
> wrote:
>
>> As far as I know in other to deploy and execute jobs in EC2 you need to
>> assembly you project, copy your jar into the cluster, log into using ssh
>> and submit the job.
>>
>> To avoid having to do this I've been prototyping an sbt plugin(1) that
>> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
>> your local machine using sbt.
>>
>> It's a simple plugin that actually rely on spark-ec2 and spark-submit,
>> but I'd like to have feedback and see if this plugin makes any sense
>> before going ahead with the final impl or if there is any other easy way to
>> do so.
>>
>> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>>
>> Thanks,
>>
>>
>>
>
--
Kind Regards,
Shafaq
Re: New sbt plugin to deploy jobs to EC2
Posted by andy petrella <an...@gmail.com>.
\o/ => will test it soon or sooner, gr8 idea btw
aℕdy ℙetrella
about.me/noootsab
[image: aℕdy ℙetrella on about.me]
<http://about.me/noootsab>
On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego <fb...@gilt.com>
wrote:
> As far as I know in other to deploy and execute jobs in EC2 you need to
> assembly you project, copy your jar into the cluster, log into using ssh
> and submit the job.
>
> To avoid having to do this I've been prototyping an sbt plugin(1) that
> allows to create and send Spark jobs to an Amazon EC2 cluster directly from
> your local machine using sbt.
>
> It's a simple plugin that actually rely on spark-ec2 and spark-submit, but
> I'd like to have feedback and see if this plugin makes any sense before
> going ahead with the final impl or if there is any other easy way to do so.
>
> (1) https://github.com/felixgborrego/sbt-spark-ec2-plugin
>
> Thanks,
>
>
>