You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nightwolf <ni...@gmail.com> on 2014/07/31 01:51:45 UTC

Spark Deployment Patterns - Automated Deployment & Performance Testing

Hi all,

We are developing an application which uses Spark & Hive to do static and
ad-hoc reporting. For these static reports, they take a number of parameters
and then run over a data set. We would like to make it easier to test
performance of these reports on a cluster.

If we have a test cluster running with a sufficient sample data set which
developers can share. To speed up development time, what is the best way to
deploy a Spark application to a Spark cluster (in standalone) via an IDE?

I'm thinking we would create an SBT task which would run the spark submit
script. Is there a better way?

Eventually this will feed into some automated performance testing which we
plan to run as a twice daily Jenkins job. If its an SBT deploy task, it
makes it easy to call in Jenkins. Is there a better way to do this?

Posted on StackOverflow as well;
http://stackoverflow.com/questions/25048784/spark-automated-deployment-performance-testing 

Any advice/experience appreciated!

Cheers!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Deployment-Patterns-Automated-Deployment-Performance-Testing-tp11000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Deployment Patterns - Automated Deployment & Performance Testing

Posted by nightwolf <ni...@gmail.com>.
Thanks AL! 

Thats what I though. I've setup nexus to maintain spark libs and download
them when needed. 

For development purposes. Suppose we have a dev cluster. Is it possible to
run the driver program locally (on a  developers machine)? 

I..e just run the driver from the ID and have it connect to the master and
worker nodes to ship out its tasks? 

Cheers,
N



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Deployment-Patterns-Automated-Deployment-Performance-Testing-tp11000p11414.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Deployment Patterns - Automated Deployment & Performance Testing

Posted by Andrew Lee <al...@hotmail.com>.
You should be able to use either SBT or maven to create your JAR files (not a fat jar), and only deploying the JAR for spark-submit.

1. Sync spark libs and versions with your development env and CLASSPATH in your IDE (unfortunately this needs to be hard copied, and may result in split-brain syndrome and version inconsistency if you didn't manage this part with your Spark Jenkins pipeline, assuming you are building Spark yourself, if not then it's easier, just make sure you have the same copy on HDFS or S3 for reuse)

2. Copy only the jar and reuse the assembly jar for Spark core. Either manual copy to HDFS or let spark-submit pick up your jars and deploy them into .stagingSpark will work.

You don't need to rebuild Spark every time. 

Hope this helps.

AL

> On Jul 30, 2014, at 16:52, "nightwolf" <ni...@gmail.com> wrote:
> 
> Hi all,
> 
> We are developing an application which uses Spark & Hive to do static and
> ad-hoc reporting. For these static reports, they take a number of parameters
> and then run over a data set. We would like to make it easier to test
> performance of these reports on a cluster.
> 
> If we have a test cluster running with a sufficient sample data set which
> developers can share. To speed up development time, what is the best way to
> deploy a Spark application to a Spark cluster (in standalone) via an IDE?
> 
> I'm thinking we would create an SBT task which would run the spark submit
> script. Is there a better way?
> 
> Eventually this will feed into some automated performance testing which we
> plan to run as a twice daily Jenkins job. If its an SBT deploy task, it
> makes it easy to call in Jenkins. Is there a better way to do this?
> 
> Posted on StackOverflow as well;
> http://stackoverflow.com/questions/25048784/spark-automated-deployment-performance-testing 
> 
> Any advice/experience appreciated!
> 
> Cheers!
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Deployment-Patterns-Automated-Deployment-Performance-Testing-tp11000.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.