You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Ajay Nair <pr...@gmail.com> on 2014/04/23 07:03:31 UTC

Spark on wikipedia dataset

I am going to perform some test experiments on the wikipedia dataset using
the spark framework. I know wikipedia data set might already have been
analyzed, but what are the potential explored/unexplored aspects of spark
that can be tested and benchmarked on wikipedia dataset?

Thanks
AJ

Re: Spark on wikipedia dataset

Posted by Mayur Rustagi <ma...@gmail.com>.
Huge joins would be interesting. I do all my demos on wikipedia dataset for
Shark. Joins are typical pain to showcase & show off :)

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Apr 23, 2014 at 10:33 AM, Ajay Nair <pr...@gmail.com> wrote:

> I am going to perform some test experiments on the wikipedia dataset using
> the spark framework. I know wikipedia data set might already have been
> analyzed, but what are the potential explored/unexplored aspects of spark
> that can be tested and benchmarked on wikipedia dataset?
>
> Thanks
> AJ
>