You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Tamer TAS <ta...@outlook.com> on 2015/03/11 21:22:39 UTC

Apache Spark GSOC 2015

Hello Everyone,

I'm a senior year computer engineering student in Turkey.
My main area of interests are cloud computing and machine learning.

I've been working on Apache Spark using Scala API for a few months. My projects involved the use of MLib for a movie recommendation system and a stock prediction model. I would be interested in working on Spark for GSOC 2015. From my experience there a few enhancements that can be done; 
 - Learning models can be standardized in a hierarchical manner to increase code quality and make future algorithm implementations easier. For example, even though it's in graphx library, SVD++ didn't have any model implementations. Currently it only returns the pieces of the calculation. The documentation wasn't clear either (apart from the link to the SVD++ paper). 
 - New algorithms might be implemented to such as restricted Boltzmann machines, tensor models and tensor factorization for recommendation sub-library, svm multi-class classification.
 - Testing documentation was close to none(only a blog post link). Each test creates a new spark context. Work-arounds were necessary to increase testing productivity(e.g. pass,fail,refactor cycle was taking a long time).
But, don't get the idea that I dislike Spark for not having those features. I loved working with Spark and I'd be happy to work on improving it. Mainly the model hierarchy and new machine learning algorithms for Spark MLib and GraphX if there is anyone who would be interested in mentoring. I'll work on a proposal to give more details about algorithms, a timeline. I just wanted to give a heads-up before doing so.
If you have any questions please feel free to ask.
Thanks in advance.

Tamer Tas