You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Yu Ishikawa <yu...@gmail.com> on 2014/10/03 02:37:10 UTC

What is the best way to build my developing Spark for testing on EC2?

Hi all, 

I am trying to contribute some machine learning algorithms to MLlib. 
I must evaluate their performance on a cluster, changing input data 
size, the number of CPU cores and any their parameters.

I would like to build my develoipng Spark on EC2 automatically. 
Is there already a building script for a developing version like spark-ec2
script?
Or if you have any good idea to evaluate the performance of a developing 
MLlib algorithm on a spark cluster like EC2, could you tell me?

Best,



-----
-- Yu Ishikawa
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-the-best-way-to-build-my-developing-Spark-for-testing-on-EC2-tp8638.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: What is the best way to build my developing Spark for testing on EC2?

Posted by Yu Ishikawa <yu...@gmail.com>.
Hi Evan,

Sorry for my replay late. And Thank you for your comment.

> As far as cluster set up goes, I usually launch spot instances with the
> spark-ec2 scripts, 
> and then check out a repo which contains a simple driver application for
> my code. 
> Then I have something crude like bash scripts running my program and
> collecting output. 

It's just as you thought.  I agree with you.

> You could have a look at the spark-perf repo if you want something a
> little better principled/automatic. 

I overlooked this. I will give it a try.

best,



-----
-- Yu Ishikawa
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-the-best-way-to-build-my-developing-Spark-for-testing-on-EC2-tp8638p8677.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: What is the best way to build my developing Spark for testing on EC2?

Posted by Evan Sparks <ev...@gmail.com>.
I recommend using the data generators provided with MLlib to generate synthetic data for your scalability tests - provided they're well suited for your algorithms. They let you control things like number of examples and dimensionality of your dataset, as well as number of partitions. 

As far as cluster set up goes, I usually launch spot instances with the spark-ec2 scripts, and then check out a repo which contains a simple driver application for my code. Then I have something crude like bash scripts running my program and collecting output. 

You could have a look at the spark-perf repo if you want something a little better principled/automatic. 

- Evan

> On Oct 2, 2014, at 5:37 PM, Yu Ishikawa <yu...@gmail.com> wrote:
> 
> Hi all, 
> 
> I am trying to contribute some machine learning algorithms to MLlib. 
> I must evaluate their performance on a cluster, changing input data 
> size, the number of CPU cores and any their parameters.
> 
> I would like to build my develoipng Spark on EC2 automatically. 
> Is there already a building script for a developing version like spark-ec2
> script?
> Or if you have any good idea to evaluate the performance of a developing 
> MLlib algorithm on a spark cluster like EC2, could you tell me?
> 
> Best,
> 
> 
> 
> -----
> -- Yu Ishikawa
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-the-best-way-to-build-my-developing-Spark-for-testing-on-EC2-tp8638.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org