You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Manolis Gemeliaris <ge...@gmail.com> on 2022/05/05 17:17:31 UTC
An online kmeans algorithm for Spark
Hello everyone on the Dev team of Apache Spark.
My name is Manolis Gemeliaris and I am a student at the Hellenic
Mediterranean University (former TEI of Crete). For my thesis project I
would like to add an online kmeans algorithm (paper
<https://arxiv.org/abs/1412.5721> (Edo Liberty et al) and python
implementation <https://github.com/sviri/kmeans/tree/main/onlineKmeans/src>
(by the authors)) to Apache Spark.
As I have already read it is a really big procedure to get something like
this officially accepted and it can take a long time to achieve. So I would
like to do it as an Open Source 3rd party package instead, that would be
compatible with Apache Spark 3.
I have already read the contribution guidelines for Spark and taken some
time studying the code on github.
I would like to ask if anyone can find the time to help me get started. Of
course I realize that your time is of importance, so just any tips that you
can share would be greatly appreciated.
Thank you in advance,
Best Regards,
Manolis Gemeliaris