You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike <sp...@good-with-numbers.com> on 2013/08/24 02:12:44 UTC

sample data & code for performance tests

I'm looking to put together some representative tests for Spark.  Where 
can I find such data and code?  There must be some already existing.  
Some tests (logistic regression, k-means, PageRank) are mentioned in the 
RDD paper, for example.

Re: sample data & code for performance tests

Posted by Matei Zaharia <ma...@gmail.com>.
Hi Mike,

This project contains some small synthetic benchmarks: https://github.com/amplab/spark-perf. Otherwise, for ML algorithms, look in mllib -- it comes with driver programs for K-means, logistic regression, matrix factorization, etc, as well as data generators for them.

Matei

On Aug 23, 2013, at 5:12 PM, Mike <sp...@good-with-numbers.com> wrote:

> I'm looking to put together some representative tests for Spark.  Where 
> can I find such data and code?  There must be some already existing.  
> Some tests (logistic regression, k-means, PageRank) are mentioned in the 
> RDD paper, for example.