You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike <sp...@good-with-numbers.com> on 2013/08/24 02:12:44 UTC
sample data & code for performance tests
I'm looking to put together some representative tests for Spark. Where
can I find such data and code? There must be some already existing.
Some tests (logistic regression, k-means, PageRank) are mentioned in the
RDD paper, for example.
Re: sample data & code for performance tests
Posted by Matei Zaharia <ma...@gmail.com>.
Hi Mike,
This project contains some small synthetic benchmarks: https://github.com/amplab/spark-perf. Otherwise, for ML algorithms, look in mllib -- it comes with driver programs for K-means, logistic regression, matrix factorization, etc, as well as data generators for them.
Matei
On Aug 23, 2013, at 5:12 PM, Mike <sp...@good-with-numbers.com> wrote:
> I'm looking to put together some representative tests for Spark. Where
> can I find such data and code? There must be some already existing.
> Some tests (logistic regression, k-means, PageRank) are mentioned in the
> RDD paper, for example.