You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by urun dogan <ur...@gmail.com> on 2011/11/15 00:04:21 UTC

What to Implement/Improve/Document?

Hi All;

I want to give my congratulation to all of the contributors of the project.
I found the idea of this project so nice and I want to contribute to the
project.

I am postdoctoral researcher who is involved on developing machine learning
algorithms. During my PhD I have developed several multiclass SVM

techniques and solvers. Now I am involved in a European Union project which
deals with large scale machine learning problems. I have a 5-6 years of

C++ development experience and I like developing and implementing new
machine learning techniques (Yes I know that Mahout uses Java :) , I will
try my best) .

My main expertise are classification, regression and transfer learning. I
have seen several open topics in http://mahout.apache.org/ and these are

1) Locally Weighted Linear Regression

2) Gaussian Discriminative Analysis

3) Independent Component Analysis

4) Principal Components Analysis

5) Classification with Perceptron or Winnow

6) Neural Network

I am aware that in Jira there are also some open issues. I can work on
anything. I think that before starting

any kind of coding I need to take the comments of experts in this project?
What do you recommend to me to start with?

Cheers

Ueruen

Re: What to Implement/Improve/Document?

Posted by Grant Ingersoll <gs...@apache.org>.

https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute has some tips, ideas, etc.  It's usually best to start with a few small patches to get your feet wet w/ the development process.  


On Nov 14, 2011, at 7:44 PM, Raphael Cendrillon wrote:

> Hi Urun,
> 
> I'm in a very similar situation. I have a background (PhD) in optimization and signal processing and some experience with principal component analysis. I'm fairly comfortable with Java. 
> 
> I'm also very interested in Mahout, and large scale problems. 
> 
> If we can find a suitable area I would be very happy to work together with you. 
> 
> One area I've been reading more into lately is parallelization of SVM training algorithms, although I'm not sure whether this is a worthwhile direction. 
> 
> Any recommendations would be much appreciated!
> 
> On Nov 14, 2011, at 1:04 PM, urun dogan <ur...@gmail.com> wrote:
> 
>> Hi All;
>> 
>> I want to give my congratulation to all of the contributors of the project.
>> I found the idea of this project so nice and I want to contribute to the
>> project.
>> 
>> I am postdoctoral researcher who is involved on developing machine learning
>> algorithms. During my PhD I have developed several multiclass SVM
>> 
>> techniques and solvers. Now I am involved in a European Union project which
>> deals with large scale machine learning problems. I have a 5-6 years of
>> 
>> C++ development experience and I like developing and implementing new
>> machine learning techniques (Yes I know that Mahout uses Java :) , I will
>> try my best) .
>> 
>> My main expertise are classification, regression and transfer learning. I
>> have seen several open topics in http://mahout.apache.org/ and these are
>> 
>> 1) Locally Weighted Linear Regression
>> 
>> 2) Gaussian Discriminative Analysis
>> 
>> 3) Independent Component Analysis
>> 
>> 4) Principal Components Analysis
>> 
>> 5) Classification with Perceptron or Winnow
>> 
>> 6) Neural Network
>> 
>> I am aware that in Jira there are also some open issues. I can work on
>> anything. I think that before starting
>> 
>> any kind of coding I need to take the comments of experts in this project?
>> What do you recommend to me to start with?
>> 
>> Cheers
>> 
>> Ueruen

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

Re: What to Implement/Improve/Document?

Posted by Raphael Cendrillon <ce...@gmail.com>.

Hi Urun,

I'm in a very similar situation. I have a background (PhD) in optimization and signal processing and some experience with principal component analysis. I'm fairly comfortable with Java. 

I'm also very interested in Mahout, and large scale problems. 

If we can find a suitable area I would be very happy to work together with you. 

One area I've been reading more into lately is parallelization of SVM training algorithms, although I'm not sure whether this is a worthwhile direction. 

Any recommendations would be much appreciated!

On Nov 14, 2011, at 1:04 PM, urun dogan <ur...@gmail.com> wrote:

> Hi All;
> 
> I want to give my congratulation to all of the contributors of the project.
> I found the idea of this project so nice and I want to contribute to the
> project.
> 
> I am postdoctoral researcher who is involved on developing machine learning
> algorithms. During my PhD I have developed several multiclass SVM
> 
> techniques and solvers. Now I am involved in a European Union project which
> deals with large scale machine learning problems. I have a 5-6 years of
> 
> C++ development experience and I like developing and implementing new
> machine learning techniques (Yes I know that Mahout uses Java :) , I will
> try my best) .
> 
> My main expertise are classification, regression and transfer learning. I
> have seen several open topics in http://mahout.apache.org/ and these are
> 
> 1) Locally Weighted Linear Regression
> 
> 2) Gaussian Discriminative Analysis
> 
> 3) Independent Component Analysis
> 
> 4) Principal Components Analysis
> 
> 5) Classification with Perceptron or Winnow
> 
> 6) Neural Network
> 
> I am aware that in Jira there are also some open issues. I can work on
> anything. I think that before starting
> 
> any kind of coding I need to take the comments of experts in this project?
> What do you recommend to me to start with?
> 
> Cheers
> 
> Ueruen

Re: What to Implement/Improve/Document?

Posted by Ted Dunning <te...@gmail.com>.

Regarding linear classifiers, I think that the cluster/classifier
unification and introduction of ASGD are the only items of substantial
impact.

On Wed, Nov 16, 2011 at 9:39 AM, Josh Patterson <jo...@cloudera.com> wrote:

> Could you then make a list of JIRAs that you think are more
> interesting in the near term, possibly more relevant?
>

Re: What to Implement/Improve/Document?

Posted by Josh Patterson <jo...@cloudera.com>.

I'd have to admit my interest in SVMs is more of the "abstract
curiosity" nature;

In the case of needed focus in the near term, similar to how Grant tagged:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+MAHOUT_INTRO_CONTRIBUTE

Could you then make a list of JIRAs that you think are more
interesting in the near term, possibly more relevant?

JP

On Wed, Nov 16, 2011 at 10:46 AM, Ted Dunning <te...@gmail.com> wrote:
> On Wed, Nov 16, 2011 at 12:09 AM, urun dogan <ur...@gmail.com> wrote:
>
>> Hi All;
>>
>> As I mentioned, I really found interesting to implement SGD and Pegasos. We
>> can add Pegasos into SGD modules.
>
>
> Based on Leon Bottou's results, I would recommend a simple SGD
> implementation of SVM rather than Pegasos.
>
> http://leon.bottou.org/projects/sgd
> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
> http://arxiv.org/abs/1107.2490
>
>
>> However, I think there are two issues we
>> need to clarify:
>>
>> 1) In general SGD like ideas are used for online learning (of course they
>> can be converted to batch learning) and Pegasos is used for batch learning.
>>
>
> I see no need for batch learning unless there is a net training benefit.
>
>
>>  Therefore may be we need to two similar but different enough software
>> architecture (I am not sure). If my intuition is right then it makes sense
>> to implement Pegasos and SGD independently. Further, especially Pegasus is
>> a state of the art method (in terms of speed) for text classification,
>> structured data prediction and these kind of problems, may be this is also
>> a point we need to take into account because there thousands of people who
>> are dealing with web scale text data for search engines, recommender
>> systems (I am not one of them therefore may be I am wrong here).
>>
>
> Pegasos is nice, but I don't necessarily see it as state of the art.
>
> For large-scale problems, in fact, I don't even see SVM as state of the
> art.  Most (not all) large-scale problems tend to be sparse and very high
> dimension.  This makes simple linear classifiers with L1 regularization
> very effective and often more effective than L2 regularization as with SVM.
>
>
>
>> 2)  Pegasos will be faster for than any other SVM solver for only linear
>> kernels.
>
>
> I don't see this in the literature.  See Xu's paper, referenced above.
>
>
>> In the past there was belief that Pegasos can be applied to
>> nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it
>> will be still faster than other SVM solvers/SMO like solvers.
>
>
> I am not hearing a huge need for non-linear kernels in large scale
> learning.  Perhaps with image processing, but not with much else.  Also, I
> haven't heard that there isn't an SGD-like learning method for non-linear
> kernels.
>
>
>
>> ... It is also known fact that, with a appropriate model selection,
>> nonlinear kernels give better classification accuracy then linear kernels.
>>
>
> Actually, not.  I think that the situations where non-linear kernels are
> better are more limited than most suppose, particularly for large-scale
> applications.
>
>
>> Exactly at this point, we need online learning (SGS/AGSD based method), we
>> can still use nonlinear kernels, parallelize the algorithm and we can have
>> a online SVM method for large/web scale datasets.
>>
>
> Now this begins to sound right.
>
> Honestly I am so much into SVM and kernel machines and I fear that I am
>> making big fuss out of small problems.
>
>
> My key question is whether you have problems that need solving.  Or do you
> have an itch to do an implementation for the sake of having the
> implementation?
>
> Either one is a reasonable motive, but the first is preferable.
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Ok, today I will start to read SGD code which is in the repo and I will
think about how to implement AGSD nicely. Your tricks are really useful.
On 17 Nov 2011 18:16, "Ted Dunning" <te...@gmail.com> wrote:

Re: What to Implement/Improve/Document?

Posted by Ted Dunning <te...@gmail.com>.

The key tricks are:

- do the updates of the averaged model in a sparse fashion.  This will
require doubling the space kept by the model

- determine when to switch to averaging

In addition we should bring in at the same time

- more flexibility on loss function (to allow the code to implement SVM)


On Thu, Nov 17, 2011 at 2:26 AM, urun dogan <ur...@gmail.com> wrote:

> Hi Ted;
>
> I start to read the paper and I think I will finish it today. It is a quite
> nice approach and
> thanks for supervision.
>
> Cheers
> Ürün
>
> On Wed, Nov 16, 2011 at 8:14 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Wed, Nov 16, 2011 at 9:50 AM, urun dogan <ur...@gmail.com> wrote:
> >
> > >
> > > I have written the previous email before reading Josh's email. Are
> there
> > > any objections if I conclude that: implementation of SGD/ASGD based
> > methods
> > > have priority and therefore I will start implement these methods soon ?
> > >
> >
> > I think that they are important.  But I haven't been able to partition
> off
> > enough time to actually do it so my vote is degraded somewhat.  I do know
> > that people I have worked with would benefit from the results shown in
> the
> > Xu paper.
> >
> > @Ted: If this is the case, I am looking forward to have your supervision
> > > about this issue.
> > >
> >
> > Excellent.
> >
> > Have you looked at the Xu paper?
> >
>

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Hi Ted;

I start to read the paper and I think I will finish it today. It is a quite
nice approach and
thanks for supervision.

Cheers
Ürün

On Wed, Nov 16, 2011 at 8:14 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Nov 16, 2011 at 9:50 AM, urun dogan <ur...@gmail.com> wrote:
>
> >
> > I have written the previous email before reading Josh's email. Are there
> > any objections if I conclude that: implementation of SGD/ASGD based
> methods
> > have priority and therefore I will start implement these methods soon ?
> >
>
> I think that they are important.  But I haven't been able to partition off
> enough time to actually do it so my vote is degraded somewhat.  I do know
> that people I have worked with would benefit from the results shown in the
> Xu paper.
>
> @Ted: If this is the case, I am looking forward to have your supervision
> > about this issue.
> >
>
> Excellent.
>
> Have you looked at the Xu paper?
>

Re: What to Implement/Improve/Document?

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Nov 16, 2011 at 9:50 AM, urun dogan <ur...@gmail.com> wrote:

>
> I have written the previous email before reading Josh's email. Are there
> any objections if I conclude that: implementation of SGD/ASGD based methods
> have priority and therefore I will start implement these methods soon ?
>

I think that they are important.  But I haven't been able to partition off
enough time to actually do it so my vote is degraded somewhat.  I do know
that people I have worked with would benefit from the results shown in the
Xu paper.

@Ted: If this is the case, I am looking forward to have your supervision
> about this issue.
>

Excellent.

Have you looked at the Xu paper?

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Hi,

I have written the previous email before reading Josh's email. Are there
any objections if I conclude that: implementation of SGD/ASGD based methods
have priority and therefore I will start implement these methods soon ?

@Ted: If this is the case, I am looking forward to have your supervision
about this issue.

Cheers
Ürün

On Wed, Nov 16, 2011 at 6:42 PM, urun dogan <ur...@gmail.com> wrote:

> Hi;
>
> First of all I do not want to be in the position of defending Pegasos
> algorithm. I am interested in implementing SGD/ASGD or Pegosos or even
> both. Of course what I want is to have useful code for the project (not for
> only me). I do not have a preference over algorithms if they are fast
> enough to solve problems.
>
> Based on Leon Bottou's results, I would recommend a simple SGD
>> implementation of SVM rather than Pegasos.
>>
>> http://leon.bottou.org/projects/sgd
>> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
>> http://arxiv.org/abs/1107.2490
>>
>>
>> > However, I think there are two issues we
>> > need to clarify:
>> >
>> > 1) In general SGD like ideas are used for online learning (of course
>> they
>> > can be converted to batch learning) and Pegasos is used for batch
>> learning.
>> >
>>
>> I see no need for batch learning unless there is a net training benefit.
>>
>
> Point taken. I do not have enough large scale experience.
>
>
>>
>>
>> >  Therefore may be we need to two similar but different enough software
>> > architecture (I am not sure). If my intuition is right then it makes
>> sense
>> > to implement Pegasos and SGD independently. Further, especially Pegasus
>> is
>> > a state of the art method (in terms of speed) for text classification,
>> > structured data prediction and these kind of problems, may be this is
>> also
>> > a point we need to take into account because there thousands of people
>> who
>> > are dealing with web scale text data for search engines, recommender
>> > systems (I am not one of them therefore may be I am wrong here).
>> >
>>
>> Pegasos is nice, but I don't necessarily see it as state of the art.
>>
>> For large-scale problems, in fact, I don't even see SVM as state of the
>> art.  Most (not all) large-scale problems tend to be sparse and very high
>> dimension.  This makes simple linear classifiers with L1 regularization
>> very effective and often more effective than L2 regularization as with
>> SVM.
>>
>
> Point taken. I completely agree that L1 regularization is very effective
> for most (not all)
> of the large scale data sets (At least I see this from the published
> papers).
>
>>
>>
>>
>> > 2)  Pegasos will be faster for than any other SVM solver for only linear
>> > kernels.
>>
>>
>> I don't see this in the literature.  See Xu's paper, referenced above.
>>
>
> Your comments are very interesting and I will definitely read Xu's paper
> (thanks a lot) . However, when I look Table 2 of  Shai Shalev-Shwartz
> paper (reference given in my previous email), it seems that Pegosas is not
> faster than Svm-Light when used with a nonlinear kernel. I believe this is
> caused by kernel evaluations which are used in kernelized Pegasos
> (algorithm is given Figure 3 of the same paper).
>
>
>>
>> > In the past there was belief that Pegasos can be applied to
>> > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and
>> it
>> > will be still faster than other SVM solvers/SMO like solvers.
>>
>>
>> I am not hearing a huge need for non-linear kernels in large scale
>> learning.  Perhaps with image processing, but not with much else.  Also, I
>> haven't heard that there isn't an SGD-like learning method for non-linear
>> kernels.
>>
>> Point taken.
>
>>
>> > ... It is also known fact that, with a appropriate model selection,
>> > nonlinear kernels give better classification accuracy then linear
>> kernels.
>> >
>>
>> Actually, not.  I think that the situations where non-linear kernels are
>> better are more limited than most suppose, particularly for large-scale
>> applications.
>>
>>
>> > Exactly at this point, we need online learning (SGS/AGSD based method),
>> we
>> > can still use nonlinear kernels, parallelize the algorithm and we can
>> have
>> > a online SVM method for large/web scale datasets.
>> >
>>
>> Now this begins to sound right.
>>
>> Honestly I am so much into SVM and kernel machines and I fear that I am
>> > making big fuss out of small problems.
>>
>>
>> My key question is whether you have problems that need solving.  Or do you
>> have an itch to do an implementation for the sake of having the
>> implementation?
>>
>> Either one is a reasonable motive, but the first is preferable.
>>
>
> As I mentioned above, I am really interested in applying machine learning
> algorithms to large/web scale data. This does not mean
> that I am a fan of a particular algorithm. Therefore I started to this
> discussion and titled it "What to Implement/Improve/Document?".   From your
> comments also from the comments of other people I started to learn a lot
> (thanks a lot). I do not have an enough practical experience for large/web
> scale data problems so I am open to comments from everyone.  As you have
> stated that "I see no need for batch learning unless there is a net
> training benefit" then SGD/AGSD based algorithms should have a priority
> over "batch learning" methods. I am looking forward to have more comments
> and I hope we will have a decision on what to implement.
>
> Cheers
> Ürün
>
>

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Hi;

First of all I do not want to be in the position of defending Pegasos
algorithm. I am interested in implementing SGD/ASGD or Pegosos or even
both. Of course what I want is to have useful code for the project (not for
only me). I do not have a preference over algorithms if they are fast
enough to solve problems.

Based on Leon Bottou's results, I would recommend a simple SGD
> implementation of SVM rather than Pegasos.
>
> http://leon.bottou.org/projects/sgd
> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
> http://arxiv.org/abs/1107.2490
>
>
> > However, I think there are two issues we
> > need to clarify:
> >
> > 1) In general SGD like ideas are used for online learning (of course they
> > can be converted to batch learning) and Pegasos is used for batch
> learning.
> >
>
> I see no need for batch learning unless there is a net training benefit.
>

Point taken. I do not have enough large scale experience.


>
>
> >  Therefore may be we need to two similar but different enough software
> > architecture (I am not sure). If my intuition is right then it makes
> sense
> > to implement Pegasos and SGD independently. Further, especially Pegasus
> is
> > a state of the art method (in terms of speed) for text classification,
> > structured data prediction and these kind of problems, may be this is
> also
> > a point we need to take into account because there thousands of people
> who
> > are dealing with web scale text data for search engines, recommender
> > systems (I am not one of them therefore may be I am wrong here).
> >
>
> Pegasos is nice, but I don't necessarily see it as state of the art.
>
> For large-scale problems, in fact, I don't even see SVM as state of the
> art.  Most (not all) large-scale problems tend to be sparse and very high
> dimension.  This makes simple linear classifiers with L1 regularization
> very effective and often more effective than L2 regularization as with SVM.
>

Point taken. I completely agree that L1 regularization is very effective
for most (not all)
of the large scale data sets (At least I see this from the published
papers).

>
>
>
> > 2)  Pegasos will be faster for than any other SVM solver for only linear
> > kernels.
>
>
> I don't see this in the literature.  See Xu's paper, referenced above.
>

Your comments are very interesting and I will definitely read Xu's paper
(thanks a lot) . However, when I look Table 2 of  Shai Shalev-Shwartz paper
(reference given in my previous email), it seems that Pegosas is not faster
than Svm-Light when used with a nonlinear kernel. I believe this is caused
by kernel evaluations which are used in kernelized Pegasos (algorithm is
given Figure 3 of the same paper).


>
> > In the past there was belief that Pegasos can be applied to
> > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and
> it
> > will be still faster than other SVM solvers/SMO like solvers.
>
>
> I am not hearing a huge need for non-linear kernels in large scale
> learning.  Perhaps with image processing, but not with much else.  Also, I
> haven't heard that there isn't an SGD-like learning method for non-linear
> kernels.
>
> Point taken.

>
> > ... It is also known fact that, with a appropriate model selection,
> > nonlinear kernels give better classification accuracy then linear
> kernels.
> >
>
> Actually, not.  I think that the situations where non-linear kernels are
> better are more limited than most suppose, particularly for large-scale
> applications.
>
>
> > Exactly at this point, we need online learning (SGS/AGSD based method),
> we
> > can still use nonlinear kernels, parallelize the algorithm and we can
> have
> > a online SVM method for large/web scale datasets.
> >
>
> Now this begins to sound right.
>
> Honestly I am so much into SVM and kernel machines and I fear that I am
> > making big fuss out of small problems.
>
>
> My key question is whether you have problems that need solving.  Or do you
> have an itch to do an implementation for the sake of having the
> implementation?
>
> Either one is a reasonable motive, but the first is preferable.
>

As I mentioned above, I am really interested in applying machine learning
algorithms to large/web scale data. This does not mean
that I am a fan of a particular algorithm. Therefore I started to this
discussion and titled it "What to Implement/Improve/Document?".   From your
comments also from the comments of other people I started to learn a lot
(thanks a lot). I do not have an enough practical experience for large/web
scale data problems so I am open to comments from everyone.  As you have
stated that "I see no need for batch learning unless there is a net
training benefit" then SGD/AGSD based algorithms should have a priority
over "batch learning" methods. I am looking forward to have more comments
and I hope we will have a decision on what to implement.

Cheers
Ürün

Re: What to Implement/Improve/Document?

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Nov 16, 2011 at 12:09 AM, urun dogan <ur...@gmail.com> wrote:

> Hi All;
>
> As I mentioned, I really found interesting to implement SGD and Pegasos. We
> can add Pegasos into SGD modules.

Based on Leon Bottou's results, I would recommend a simple SGD
implementation of SVM rather than Pegasos.

http://leon.bottou.org/projects/sgd
http://leon.bottou.org/publications/pdf/compstat-2010.pdf
http://arxiv.org/abs/1107.2490

> However, I think there are two issues we
> need to clarify:
>
> 1) In general SGD like ideas are used for online learning (of course they
> can be converted to batch learning) and Pegasos is used for batch learning.
>

I see no need for batch learning unless there is a net training benefit.

>  Therefore may be we need to two similar but different enough software
> architecture (I am not sure). If my intuition is right then it makes sense
> to implement Pegasos and SGD independently. Further, especially Pegasus is
> a state of the art method (in terms of speed) for text classification,
> structured data prediction and these kind of problems, may be this is also
> a point we need to take into account because there thousands of people who
> are dealing with web scale text data for search engines, recommender
> systems (I am not one of them therefore may be I am wrong here).
>

Pegasos is nice, but I don't necessarily see it as state of the art.

For large-scale problems, in fact, I don't even see SVM as state of the
art.  Most (not all) large-scale problems tend to be sparse and very high
dimension.  This makes simple linear classifiers with L1 regularization
very effective and often more effective than L2 regularization as with SVM.

> 2)  Pegasos will be faster for than any other SVM solver for only linear
> kernels.

I don't see this in the literature.  See Xu's paper, referenced above.

> In the past there was belief that Pegasos can be applied to
> nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it
> will be still faster than other SVM solvers/SMO like solvers.

I am not hearing a huge need for non-linear kernels in large scale
learning.  Perhaps with image processing, but not with much else.  Also, I
haven't heard that there isn't an SGD-like learning method for non-linear
kernels.

> ... It is also known fact that, with a appropriate model selection,
> nonlinear kernels give better classification accuracy then linear kernels.
>

Actually, not.  I think that the situations where non-linear kernels are
better are more limited than most suppose, particularly for large-scale
applications.

> Exactly at this point, we need online learning (SGS/AGSD based method), we
> can still use nonlinear kernels, parallelize the algorithm and we can have
> a online SVM method for large/web scale datasets.
>

Now this begins to sound right.

Honestly I am so much into SVM and kernel machines and I fear that I am
> making big fuss out of small problems.

My key question is whether you have problems that need solving.  Or do you
have an itch to do an implementation for the sake of having the
implementation?

Either one is a reasonable motive, but the first is preferable.

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Hi All;

As I mentioned, I really found interesting to implement SGD and Pegasos. We
can add Pegasos into SGD modules. However, I think there are two issues we
need to clarify:

1) In general SGD like ideas are used for online learning (of course they
can be converted to batch learning) and Pegasos is used for batch learning.
 Therefore may be we need to two similar but different enough software
architecture (I am not sure). If my intuition is right then it makes sense
to implement Pegasos and SGD independently. Further, especially Pegasus is
a state of the art method (in terms of speed) for text classification,
structured data prediction and these kind of problems, may be this is also
a point we need to take into account because there thousands of people who
are dealing with web scale text data for search engines, recommender
systems (I am not one of them therefore may be I am wrong here).

2)  Pegasos will be faster for than any other SVM solver for only linear
kernels. In the past there was belief that Pegasos can be applied to
nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and it
will be still faster than other SVM solvers/SMO like solvers. Shai
Shalev-Shwartz (inventer of Pegasos algorithm) recently published a paper
which is including this issue (
http://www.cs.huji.ac.il/~shais/papers/ShalevSiSrCo10.pdf ). In this paper
he showed that Pegasos is not faster than SMO like solvers for nonlinear
kernels. It is also known fact that, with a appropriate model selection,
nonlinear kernels give better classification accuracy then linear kernels.
Exactly at this point, we need online learning (SGS/AGSD based method), we
can still use nonlinear kernels, parallelize the algorithm and we can have
a online SVM method for large/web scale datasets.

Honestly I am so much into SVM and kernel machines and I fear that I am
making big fuss out of small problems. Please comment on my ideas than I
can learn from my mistakes/misinterpretations. I am looking forward to hear
your comments and some ideas about how to proceed?


ps: I will try to run Hadoop, Mahout and other tools in my system.

ps: @Josh/Ted: Definitively I need your supervision in this sub-project.

Cheers

Ürün





On Tue, Nov 15, 2011 at 11:55 PM, Raphael Cendrillon <
cendrillon1978@gmail.com> wrote:

> Hi Urun and Josh,
>
> I'd also be interested in helping out in whatever way I can.
>
> One question, I've noticed that MAHOUT-334 was not ultimately adopted. Do
> we know the reason for this?
>
> Would it be best to finish out the patch in 232, or instead add the
> functionality into the existing SGD modules as Ted suggested?
>
> On Nov 15, 2011, at 9:46 AM, Josh Patterson <jo...@cloudera.com> wrote:
>
> > Urun,
> > I've been looking at MAHOUT-232 and reading Nello Cristianini's book
> > on SVMs. It sounds like you've done considerable more work than I in
> > this arena. I'd be interested in collaborating with you on finishing
> > out this patch, if you are interested in that type arrangement (there
> > is plenty of work to do, us splitting it might be an interesting
> > path), as it would be help in terms of "bandwidth" for both of us.
> >
> > I can also help you get used to building hadoop, mahout, tools, etc, if
> needed.
> >
> > JP
> >
> > On Tue, Nov 15, 2011 at 2:29 PM, urun dogan <ur...@gmail.com> wrote:
> >> Dear Josh and Ted;
> >>
> >> Both ideas are very attractive. Honestly I want to do both of them. I am
> >> completely aware
> >> that this quite some work to do. As I mentioned before, I am a Postdoc
> now
> >> and I am trying
> >> to develop new techniques by using AGSD. During my PhD I developed an
> >> efficient solver for
> >> multiclass SVMs which uses SMO based techniques. For comparing my solver
> >> with others,
> >> I have implemented Pegasos for a single core machine using C++.  For
> both
> >> of the methods,
> >> I have a theoretical background. Further I believe I have enough time
> for
> >> coding these kind of techniques.
> >> I will appreciate your supervisions. I think that implementing and
> >> optimization an algorithm for cloud
> >> computing is very different that implementing it for a workstation /
> >> desktop PC. As I said ,I am willing
> >> to contribute on these issues because these projects fit in my
> experience.
> >> However, if you think that
> >> this is to much work for one person, your comments are accepted. Then,
> if
> >> you tell me the priority
> >> of these two features, I will first implement the most important
> feature.
> >> Further if nobody implements
> >> the second one until I finish the first one, I will implement the second
> >> one also.
> >>
> >> Best regards,
> >> Ürün
> >>
> >>
> >>
> >>
> >> On Tue, Nov 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>
> >>> ASGD is also an opportunity laying on the table.
> >>>
> >>> http://leon.bottou.org/projects/sgd
> >>>
> >>> It would be lovely to have the current SGD system upgraded to use ASGD
> and
> >>> allow multiple loss functions to allow SVM training as well as the
> current
> >>> logistic regression.  I would be happy to supervise, but can't do the
> code
> >>> right now.
> >>>
> >>> On Tue, Nov 15, 2011 at 9:31 AM, Josh Patterson <jo...@cloudera.com>
> wrote:
> >>>
> >>>> Urun,
> >>>> Sounds like you have quite a bit of SVM experience. There is always:
> >>>>
> >>>> https://issues.apache.org/jira/browse/MAHOUT-232
> >>>>
> >>>> to take a look at which involves getting SVMs going in Mahout. I've
> >>>> looked at it a bit while working on some smaller patches, I'd be
> >>>> interested in discussing it with you given your experience if you are
> >>>> interested.
> >>>>
> >>>> I can help you get a development env going if and send some tips your
> >>>> way if you have any questions about getting going with developing for
> >>>> Mahout.
> >>>>
> >>>> Josh
> >>>>
> >>>> On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com>
> wrote:
> >>>>> Hi All;
> >>>>>
> >>>>> I want to give my congratulation to all of the contributors of the
> >>>> project.
> >>>>> I found the idea of this project so nice and I want to contribute to
> >>> the
> >>>>> project.
> >>>>>
> >>>>> I am postdoctoral researcher who is involved on developing machine
> >>>> learning
> >>>>> algorithms. During my PhD I have developed several multiclass SVM
> >>>>>
> >>>>> techniques and solvers. Now I am involved in a European Union project
> >>>> which
> >>>>> deals with large scale machine learning problems. I have a 5-6 years
> of
> >>>>>
> >>>>> C++ development experience and I like developing and implementing new
> >>>>> machine learning techniques (Yes I know that Mahout uses Java :) , I
> >>> will
> >>>>> try my best) .
> >>>>>
> >>>>> My main expertise are classification, regression and transfer
> >>> learning. I
> >>>>> have seen several open topics in http://mahout.apache.org/ and these
> >>> are
> >>>>>
> >>>>> 1) Locally Weighted Linear Regression
> >>>>>
> >>>>> 2) Gaussian Discriminative Analysis
> >>>>>
> >>>>> 3) Independent Component Analysis
> >>>>>
> >>>>> 4) Principal Components Analysis
> >>>>>
> >>>>> 5) Classification with Perceptron or Winnow
> >>>>>
> >>>>> 6) Neural Network
> >>>>>
> >>>>> I am aware that in Jira there are also some open issues. I can work
> on
> >>>>> anything. I think that before starting
> >>>>>
> >>>>> any kind of coding I need to take the comments of experts in this
> >>>> project?
> >>>>> What do you recommend to me to start with?
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> Ueruen
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Twitter: @jpatanooga
> >>>> Solution Architect @ Cloudera
> >>>> hadoop: http://www.cloudera.com
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Twitter: @jpatanooga
> > Solution Architect @ Cloudera
> > hadoop: http://www.cloudera.com
>

Re: What to Implement/Improve/Document?

Posted by Raphael Cendrillon <ce...@gmail.com>.

Hi Urun and Josh,

I'd also be interested in helping out in whatever way I can. 

One question, I've noticed that MAHOUT-334 was not ultimately adopted. Do we know the reason for this?

Would it be best to finish out the patch in 232, or instead add the functionality into the existing SGD modules as Ted suggested?

On Nov 15, 2011, at 9:46 AM, Josh Patterson <jo...@cloudera.com> wrote:

> Urun,
> I've been looking at MAHOUT-232 and reading Nello Cristianini's book
> on SVMs. It sounds like you've done considerable more work than I in
> this arena. I'd be interested in collaborating with you on finishing
> out this patch, if you are interested in that type arrangement (there
> is plenty of work to do, us splitting it might be an interesting
> path), as it would be help in terms of "bandwidth" for both of us.
> 
> I can also help you get used to building hadoop, mahout, tools, etc, if needed.
> 
> JP
> 
> On Tue, Nov 15, 2011 at 2:29 PM, urun dogan <ur...@gmail.com> wrote:
>> Dear Josh and Ted;
>> 
>> Both ideas are very attractive. Honestly I want to do both of them. I am
>> completely aware
>> that this quite some work to do. As I mentioned before, I am a Postdoc now
>> and I am trying
>> to develop new techniques by using AGSD. During my PhD I developed an
>> efficient solver for
>> multiclass SVMs which uses SMO based techniques. For comparing my solver
>> with others,
>> I have implemented Pegasos for a single core machine using C++.  For both
>> of the methods,
>> I have a theoretical background. Further I believe I have enough time for
>> coding these kind of techniques.
>> I will appreciate your supervisions. I think that implementing and
>> optimization an algorithm for cloud
>> computing is very different that implementing it for a workstation /
>> desktop PC. As I said ,I am willing
>> to contribute on these issues because these projects fit in my experience.
>> However, if you think that
>> this is to much work for one person, your comments are accepted. Then, if
>> you tell me the priority
>> of these two features, I will first implement the most important feature.
>> Further if nobody implements
>> the second one until I finish the first one, I will implement the second
>> one also.
>> 
>> Best regards,
>> Ürün
>> 
>> 
>> 
>> 
>> On Tue, Nov 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> ASGD is also an opportunity laying on the table.
>>> 
>>> http://leon.bottou.org/projects/sgd
>>> 
>>> It would be lovely to have the current SGD system upgraded to use ASGD and
>>> allow multiple loss functions to allow SVM training as well as the current
>>> logistic regression.  I would be happy to supervise, but can't do the code
>>> right now.
>>> 
>>> On Tue, Nov 15, 2011 at 9:31 AM, Josh Patterson <jo...@cloudera.com> wrote:
>>> 
>>>> Urun,
>>>> Sounds like you have quite a bit of SVM experience. There is always:
>>>> 
>>>> https://issues.apache.org/jira/browse/MAHOUT-232
>>>> 
>>>> to take a look at which involves getting SVMs going in Mahout. I've
>>>> looked at it a bit while working on some smaller patches, I'd be
>>>> interested in discussing it with you given your experience if you are
>>>> interested.
>>>> 
>>>> I can help you get a development env going if and send some tips your
>>>> way if you have any questions about getting going with developing for
>>>> Mahout.
>>>> 
>>>> Josh
>>>> 
>>>> On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com> wrote:
>>>>> Hi All;
>>>>> 
>>>>> I want to give my congratulation to all of the contributors of the
>>>> project.
>>>>> I found the idea of this project so nice and I want to contribute to
>>> the
>>>>> project.
>>>>> 
>>>>> I am postdoctoral researcher who is involved on developing machine
>>>> learning
>>>>> algorithms. During my PhD I have developed several multiclass SVM
>>>>> 
>>>>> techniques and solvers. Now I am involved in a European Union project
>>>> which
>>>>> deals with large scale machine learning problems. I have a 5-6 years of
>>>>> 
>>>>> C++ development experience and I like developing and implementing new
>>>>> machine learning techniques (Yes I know that Mahout uses Java :) , I
>>> will
>>>>> try my best) .
>>>>> 
>>>>> My main expertise are classification, regression and transfer
>>> learning. I
>>>>> have seen several open topics in http://mahout.apache.org/ and these
>>> are
>>>>> 
>>>>> 1) Locally Weighted Linear Regression
>>>>> 
>>>>> 2) Gaussian Discriminative Analysis
>>>>> 
>>>>> 3) Independent Component Analysis
>>>>> 
>>>>> 4) Principal Components Analysis
>>>>> 
>>>>> 5) Classification with Perceptron or Winnow
>>>>> 
>>>>> 6) Neural Network
>>>>> 
>>>>> I am aware that in Jira there are also some open issues. I can work on
>>>>> anything. I think that before starting
>>>>> 
>>>>> any kind of coding I need to take the comments of experts in this
>>>> project?
>>>>> What do you recommend to me to start with?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Ueruen
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Twitter: @jpatanooga
>>>> Solution Architect @ Cloudera
>>>> hadoop: http://www.cloudera.com
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Re: What to Implement/Improve/Document?

Posted by Josh Patterson <jo...@cloudera.com>.

Urun,
I've been looking at MAHOUT-232 and reading Nello Cristianini's book
on SVMs. It sounds like you've done considerable more work than I in
this arena. I'd be interested in collaborating with you on finishing
out this patch, if you are interested in that type arrangement (there
is plenty of work to do, us splitting it might be an interesting
path), as it would be help in terms of "bandwidth" for both of us.

I can also help you get used to building hadoop, mahout, tools, etc, if needed.

JP

On Tue, Nov 15, 2011 at 2:29 PM, urun dogan <ur...@gmail.com> wrote:
> Dear Josh and Ted;
>
> Both ideas are very attractive. Honestly I want to do both of them. I am
> completely aware
> that this quite some work to do. As I mentioned before, I am a Postdoc now
> and I am trying
> to develop new techniques by using AGSD. During my PhD I developed an
> efficient solver for
> multiclass SVMs which uses SMO based techniques. For comparing my solver
> with others,
> I have implemented Pegasos for a single core machine using C++.  For both
> of the methods,
> I have a theoretical background. Further I believe I have enough time for
> coding these kind of techniques.
> I will appreciate your supervisions. I think that implementing and
> optimization an algorithm for cloud
> computing is very different that implementing it for a workstation /
> desktop PC. As I said ,I am willing
> to contribute on these issues because these projects fit in my experience.
> However, if you think that
> this is to much work for one person, your comments are accepted. Then, if
> you tell me the priority
> of these two features, I will first implement the most important feature.
> Further if nobody implements
> the second one until I finish the first one, I will implement the second
> one also.
>
> Best regards,
> Ürün
>
>
>
>
> On Tue, Nov 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> ASGD is also an opportunity laying on the table.
>>
>> http://leon.bottou.org/projects/sgd
>>
>> It would be lovely to have the current SGD system upgraded to use ASGD and
>> allow multiple loss functions to allow SVM training as well as the current
>> logistic regression.  I would be happy to supervise, but can't do the code
>> right now.
>>
>> On Tue, Nov 15, 2011 at 9:31 AM, Josh Patterson <jo...@cloudera.com> wrote:
>>
>> > Urun,
>> > Sounds like you have quite a bit of SVM experience. There is always:
>> >
>> > https://issues.apache.org/jira/browse/MAHOUT-232
>> >
>> > to take a look at which involves getting SVMs going in Mahout. I've
>> > looked at it a bit while working on some smaller patches, I'd be
>> > interested in discussing it with you given your experience if you are
>> > interested.
>> >
>> > I can help you get a development env going if and send some tips your
>> > way if you have any questions about getting going with developing for
>> > Mahout.
>> >
>> > Josh
>> >
>> > On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com> wrote:
>> > > Hi All;
>> > >
>> > > I want to give my congratulation to all of the contributors of the
>> > project.
>> > > I found the idea of this project so nice and I want to contribute to
>> the
>> > > project.
>> > >
>> > > I am postdoctoral researcher who is involved on developing machine
>> > learning
>> > > algorithms. During my PhD I have developed several multiclass SVM
>> > >
>> > > techniques and solvers. Now I am involved in a European Union project
>> > which
>> > > deals with large scale machine learning problems. I have a 5-6 years of
>> > >
>> > > C++ development experience and I like developing and implementing new
>> > > machine learning techniques (Yes I know that Mahout uses Java :) , I
>> will
>> > > try my best) .
>> > >
>> > > My main expertise are classification, regression and transfer
>> learning. I
>> > > have seen several open topics in http://mahout.apache.org/ and these
>> are
>> > >
>> > > 1) Locally Weighted Linear Regression
>> > >
>> > > 2) Gaussian Discriminative Analysis
>> > >
>> > > 3) Independent Component Analysis
>> > >
>> > > 4) Principal Components Analysis
>> > >
>> > > 5) Classification with Perceptron or Winnow
>> > >
>> > > 6) Neural Network
>> > >
>> > > I am aware that in Jira there are also some open issues. I can work on
>> > > anything. I think that before starting
>> > >
>> > > any kind of coding I need to take the comments of experts in this
>> > project?
>> > > What do you recommend to me to start with?
>> > >
>> > > Cheers
>> > >
>> > > Ueruen
>> > >
>> >
>> >
>> >
>> > --
>> > Twitter: @jpatanooga
>> > Solution Architect @ Cloudera
>> > hadoop: http://www.cloudera.com
>> >
>>
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: What to Implement/Improve/Document?

Posted by urun dogan <ur...@gmail.com>.

Dear Josh and Ted;

Both ideas are very attractive. Honestly I want to do both of them. I am
completely aware
that this quite some work to do. As I mentioned before, I am a Postdoc now
and I am trying
to develop new techniques by using AGSD. During my PhD I developed an
efficient solver for
multiclass SVMs which uses SMO based techniques. For comparing my solver
with others,
I have implemented Pegasos for a single core machine using C++.  For both
of the methods,
I have a theoretical background. Further I believe I have enough time for
coding these kind of techniques.
I will appreciate your supervisions. I think that implementing and
optimization an algorithm for cloud
computing is very different that implementing it for a workstation /
desktop PC. As I said ,I am willing
to contribute on these issues because these projects fit in my experience.
However, if you think that
this is to much work for one person, your comments are accepted. Then, if
you tell me the priority
of these two features, I will first implement the most important feature.
Further if nobody implements
the second one until I finish the first one, I will implement the second
one also.

Best regards,
Ürün

On Tue, Nov 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com> wrote:

> ASGD is also an opportunity laying on the table.
>
> http://leon.bottou.org/projects/sgd
>
> It would be lovely to have the current SGD system upgraded to use ASGD and
> allow multiple loss functions to allow SVM training as well as the current
> logistic regression.  I would be happy to supervise, but can't do the code
> right now.
>
> On Tue, Nov 15, 2011 at 9:31 AM, Josh Patterson <jo...@cloudera.com> wrote:
>
> > Urun,
> > Sounds like you have quite a bit of SVM experience. There is always:
> >
> > https://issues.apache.org/jira/browse/MAHOUT-232
> >
> > to take a look at which involves getting SVMs going in Mahout. I've
> > looked at it a bit while working on some smaller patches, I'd be
> > interested in discussing it with you given your experience if you are
> > interested.
> >
> > I can help you get a development env going if and send some tips your
> > way if you have any questions about getting going with developing for
> > Mahout.
> >
> > Josh
> >
> > On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com> wrote:
> > > Hi All;
> > >
> > > I want to give my congratulation to all of the contributors of the
> > project.
> > > I found the idea of this project so nice and I want to contribute to
> the
> > > project.
> > >
> > > I am postdoctoral researcher who is involved on developing machine
> > learning
> > > algorithms. During my PhD I have developed several multiclass SVM
> > >
> > > techniques and solvers. Now I am involved in a European Union project
> > which
> > > deals with large scale machine learning problems. I have a 5-6 years of
> > >
> > > C++ development experience and I like developing and implementing new
> > > machine learning techniques (Yes I know that Mahout uses Java :) , I
> will
> > > try my best) .
> > >
> > > My main expertise are classification, regression and transfer
> learning. I
> > > have seen several open topics in http://mahout.apache.org/ and these
> are
> > >
> > > 1) Locally Weighted Linear Regression
> > >
> > > 2) Gaussian Discriminative Analysis
> > >
> > > 3) Independent Component Analysis
> > >
> > > 4) Principal Components Analysis
> > >
> > > 5) Classification with Perceptron or Winnow
> > >
> > > 6) Neural Network
> > >
> > > I am aware that in Jira there are also some open issues. I can work on
> > > anything. I think that before starting
> > >
> > > any kind of coding I need to take the comments of experts in this
> > project?
> > > What do you recommend to me to start with?
> > >
> > > Cheers
> > >
> > > Ueruen
> > >
> >
> >
> >
> > --
> > Twitter: @jpatanooga
> > Solution Architect @ Cloudera
> > hadoop: http://www.cloudera.com
> >
>

Re: What to Implement/Improve/Document?

Posted by Ted Dunning <te...@gmail.com>.

ASGD is also an opportunity laying on the table.

http://leon.bottou.org/projects/sgd

It would be lovely to have the current SGD system upgraded to use ASGD and
allow multiple loss functions to allow SVM training as well as the current
logistic regression.  I would be happy to supervise, but can't do the code
right now.

On Tue, Nov 15, 2011 at 9:31 AM, Josh Patterson <jo...@cloudera.com> wrote:

> Urun,
> Sounds like you have quite a bit of SVM experience. There is always:
>
> https://issues.apache.org/jira/browse/MAHOUT-232
>
> to take a look at which involves getting SVMs going in Mahout. I've
> looked at it a bit while working on some smaller patches, I'd be
> interested in discussing it with you given your experience if you are
> interested.
>
> I can help you get a development env going if and send some tips your
> way if you have any questions about getting going with developing for
> Mahout.
>
> Josh
>
> On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com> wrote:
> > Hi All;
> >
> > I want to give my congratulation to all of the contributors of the
> project.
> > I found the idea of this project so nice and I want to contribute to the
> > project.
> >
> > I am postdoctoral researcher who is involved on developing machine
> learning
> > algorithms. During my PhD I have developed several multiclass SVM
> >
> > techniques and solvers. Now I am involved in a European Union project
> which
> > deals with large scale machine learning problems. I have a 5-6 years of
> >
> > C++ development experience and I like developing and implementing new
> > machine learning techniques (Yes I know that Mahout uses Java :) , I will
> > try my best) .
> >
> > My main expertise are classification, regression and transfer learning. I
> > have seen several open topics in http://mahout.apache.org/ and these are
> >
> > 1) Locally Weighted Linear Regression
> >
> > 2) Gaussian Discriminative Analysis
> >
> > 3) Independent Component Analysis
> >
> > 4) Principal Components Analysis
> >
> > 5) Classification with Perceptron or Winnow
> >
> > 6) Neural Network
> >
> > I am aware that in Jira there are also some open issues. I can work on
> > anything. I think that before starting
> >
> > any kind of coding I need to take the comments of experts in this
> project?
> > What do you recommend to me to start with?
> >
> > Cheers
> >
> > Ueruen
> >
>
>
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>

Re: What to Implement/Improve/Document?

Posted by Josh Patterson <jo...@cloudera.com>.

Urun,
Sounds like you have quite a bit of SVM experience. There is always:

https://issues.apache.org/jira/browse/MAHOUT-232

to take a look at which involves getting SVMs going in Mahout. I've
looked at it a bit while working on some smaller patches, I'd be
interested in discussing it with you given your experience if you are
interested.

I can help you get a development env going if and send some tips your
way if you have any questions about getting going with developing for
Mahout.

Josh

On Mon, Nov 14, 2011 at 6:04 PM, urun dogan <ur...@gmail.com> wrote:
> Hi All;
>
> I want to give my congratulation to all of the contributors of the project.
> I found the idea of this project so nice and I want to contribute to the
> project.
>
> I am postdoctoral researcher who is involved on developing machine learning
> algorithms. During my PhD I have developed several multiclass SVM
>
> techniques and solvers. Now I am involved in a European Union project which
> deals with large scale machine learning problems. I have a 5-6 years of
>
> C++ development experience and I like developing and implementing new
> machine learning techniques (Yes I know that Mahout uses Java :) , I will
> try my best) .
>
> My main expertise are classification, regression and transfer learning. I
> have seen several open topics in http://mahout.apache.org/ and these are
>
> 1) Locally Weighted Linear Regression
>
> 2) Gaussian Discriminative Analysis
>
> 3) Independent Component Analysis
>
> 4) Principal Components Analysis
>
> 5) Classification with Perceptron or Winnow
>
> 6) Neural Network
>
> I am aware that in Jira there are also some open issues. I can work on
> anything. I think that before starting
>
> any kind of coding I need to take the comments of experts in this project?
> What do you recommend to me to start with?
>
> Cheers
>
> Ueruen
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com