You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Kannan Rajah <kr...@maprtech.com> on 2015/02/02 23:26:43 UTC

Performance test for sort shuffle

Is there a recommended performance test for sort based shuffle? Something
similar to terasort on Hadoop. I couldn't find one on the spark-perf code
base.

https://github.com/databricks/spark-perf

--
Kannan

Re: Performance test for sort shuffle

Posted by Ewan Higgs <ew...@ugent.be>.
Hi Kannan,
I have a branch here:

https://github.com/ehiggs/spark/tree/terasort

The code is in the examples. I don't do any fancy partitioning so it 
could be made quicker, I'm sure. But it should be a good baseline.

I have a WIP PR for spark-perf but I'm having trouble building it 
there[1]. I put it on the back burner until someone can get back to me 
on it.

Yours,
Ewan Higgs

[1] 
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSpark-perf-terasort-WIP-branch-tt10105.html

On 02/02/15 23:26, Kannan Rajah wrote:
> Is there a recommended performance test for sort based shuffle? Something
> similar to terasort on Hadoop. I couldn't find one on the spark-perf code
> base.
>
> https://github.com/databricks/spark-perf
>
> --
> Kannan
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org