You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by salexln <sa...@gmail.com> on 2014/06/30 17:54:58 UTC

Contributing to MLlib

Hi guys,

I'm new to Spark & MLlib and this may be a dumb question, but still....

As part of my M.Sc project, i'm working on implementation of Fuzzy C-means
(FCM) algorithm in MLlib.
FCM has many things in common with K - Means algorithm, which is already
implemented,  and I wanted to know whether should I create some inheritance
between them (some base class that would hold all the common stuff).

I could not find an answer to that in the "Spark Coding Guide"
(https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)


Appreciate your help

thanks,
Alex 




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib

Posted by salexln <sa...@gmail.com>.
thanks for the input.
at the moment , I don't have any code commits yet.

I wanted to discuss the algorithm implementation prior to the code
submission.

(never work with Git\ GutHub - so I hope this isn't very basic stuff....)








--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7169.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib

Posted by Xiangrui Meng <me...@gmail.com>.
Alex, please send the pull request to apache/spark instead of your own
repo, following the instructions in

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Thanks,
Xiangrui

On Wed, Jul 2, 2014 at 12:41 PM, RJ Nowling <rn...@gmail.com> wrote:
> Hey Alex,
>
> I'm also a new contributor.  I created a pull request for the KMeans
> MiniBatch implementation here:
>
> https://github.com/apache/spark/pull/1248
>
> I also created a JIRA here:
>
> https://issues.apache.org/jira/browse/SPARK-2308
>
> As part of my work, I started to refactor the common code to create a
> KMeansCommons file containing traits for the KMeans classes and KMeans
> objects.
>
> We should probability coordinate a bit on this.
>
> RJ
> rnowling@gmail.com
>
> On Wed, Jul 2, 2014 at 3:07 PM, salexln <sa...@gmail.com> wrote:
>> I opened a JIRA (https://issues.apache.org/jira/browse/SPARK-2344)
>>
>> and a pull request for this (https://github.com/salexln/spark/pull/1)
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7158.html
>> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314

Re: Contributing to MLlib

Posted by RJ Nowling <rn...@gmail.com>.
Hey Alex,

I'm also a new contributor.  I created a pull request for the KMeans
MiniBatch implementation here:

https://github.com/apache/spark/pull/1248

I also created a JIRA here:

https://issues.apache.org/jira/browse/SPARK-2308

As part of my work, I started to refactor the common code to create a
KMeansCommons file containing traits for the KMeans classes and KMeans
objects.

We should probability coordinate a bit on this.

RJ
rnowling@gmail.com

On Wed, Jul 2, 2014 at 3:07 PM, salexln <sa...@gmail.com> wrote:
> I opened a JIRA (https://issues.apache.org/jira/browse/SPARK-2344)
>
> and a pull request for this (https://github.com/salexln/spark/pull/1)
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7158.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.



-- 
em rnowling@gmail.com
c 954.496.2314

Re: Contributing to MLlib

Posted by salexln <sa...@gmail.com>.
I opened a JIRA (https://issues.apache.org/jira/browse/SPARK-2344)

and a pull request for this (https://github.com/salexln/spark/pull/1)



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7158.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib

Posted by salexln <sa...@gmail.com>.
thanks for the response !

that's is exactly the way i wanted to implement it :)

I will create JIRA ticket and a request.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7157.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib

Posted by "Evan R. Sparks" <ev...@gmail.com>.
Hi there,

Generally we try to avoid duplicating logic if possible, particularly for
algorithms that share a great deal of algorithmic similarity. See, for
example, the way we implement Logistic regression vs. Linear regression vs.
Linear SVM with different gradient functions all on top of SGD or L-BFGS.

Based on my (brief) look at the FCM algorithm, it appears that the main
difference is the ability to assign a weight vector associating the degree
of relationship of a given point to some centroid. My guess is that you can
figure out a way to inherit much of the K-Means logic in an algorithm for
FCM.

Regardless, if you'd like to add an algorithm, please create a JIRA ticket
for it and then send a pull request which references that JIRA where we can
discuss the specifics of that implementation and whether it is of broad
enough interest to warrant inclusion in the library.

- Evan


On Wed, Jul 2, 2014 at 11:02 AM, salexln <sa...@gmail.com> wrote:

> guys??? anyone???
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7155.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Re: Contributing to MLlib

Posted by salexln <sa...@gmail.com>.
guys??? anyone???



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-tp7125p7155.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.