You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Aditya Sarawgi <sa...@gmail.com> on 2012/03/01 06:31:32 UTC

psvm

Hello,

I am looking to implement psvm for Mahout as a part of of my coursework.
The reference paper is
http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
and there is a implementation over http://code.google.com/p/psvm/ which
uses MPI.
Any ideas, pointers are much appreciated.

Thanks
Aditya Sarawgi

Re: psvm

Posted by "Edward J. Yoon" <ed...@apache.org>.

> Btw on the benchmarks page, its not clear if the tests are for hadoop or
> hama.

It shows BSP computing performance of Apache Hama.

Hadoop is only used as a filesystem.

On Thu, Mar 1, 2012 at 4:33 PM, Aditya Sarawgi <sa...@gmail.com> wrote:
> Hi,
>
> This does look interesting. Please give me some time to read up on stuff
> and evaluate if
> its feasible to implement psvm using BSP.
> Btw on the benchmarks page, its not clear if the tests are for hadoop or
> hama.
> Am I missing something ?
>
> Thanks
> Aditya Sarawgi
>
> On Thu, Mar 1, 2012 at 1:51 AM, Thomas Jungblut <tj...@apache.org>wrote:
>
>> Hi Aditya,
>>
>> I'm from the Apache Hama team, we are working on a BSP (Bulk Synchronous
>> Parallel) Engine.
>> BSP is quite like MPI, just with 2 primitives (barrier sync and message
>> send), I don't know if it is enough for your algorithm, but I would be very
>> interested in implementing it with BSP and Apache Hama.
>>
>> I have already implemented a k-means clustering with BSP [1] which is much
>> more faster than the MapReduce implementation [2].
>> I plan to contribute it over the next few months to Mahout, since I think
>> BSP is a missing part of large scale machine learning (currently I just see
>> MapReduce implementations everywhere), you would help to give Mahout
>> another good example of BSP and machine learning.
>> And it would of course help me to convince the Mahout team of the usage of
>> Apache Hama ;)
>>
>> If you are interested, I'd be glad to hear from you.
>>
>> Best regards,
>> Thomas
>>
>> [1]
>> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java
>>
>> [2]  http://wiki.apache.org/hama/Benchmarks (scroll down a bit)
>>
>> Am 1. März 2012 06:31 schrieb Aditya Sarawgi <sa...@gmail.com>:
>>
>> Hello,
>>>
>>> I am looking to implement psvm for Mahout as a part of of my coursework.
>>> The reference paper is
>>> http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
>>> and there is a implementation over http://code.google.com/p/psvm/ which
>>> uses MPI.
>>> Any ideas, pointers are much appreciated.
>>>
>>> Thanks
>>> Aditya Sarawgi
>>>
>>
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: psvm

Posted by Aditya Sarawgi <sa...@gmail.com>.

Hi,

This does look interesting. Please give me some time to read up on stuff
and evaluate if
its feasible to implement psvm using BSP.
Btw on the benchmarks page, its not clear if the tests are for hadoop or
hama.
Am I missing something ?

Thanks
Aditya Sarawgi

On Thu, Mar 1, 2012 at 1:51 AM, Thomas Jungblut <tj...@apache.org>wrote:

> Hi Aditya,
>
> I'm from the Apache Hama team, we are working on a BSP (Bulk Synchronous
> Parallel) Engine.
> BSP is quite like MPI, just with 2 primitives (barrier sync and message
> send), I don't know if it is enough for your algorithm, but I would be very
> interested in implementing it with BSP and Apache Hama.
>
> I have already implemented a k-means clustering with BSP [1] which is much
> more faster than the MapReduce implementation [2].
> I plan to contribute it over the next few months to Mahout, since I think
> BSP is a missing part of large scale machine learning (currently I just see
> MapReduce implementations everywhere), you would help to give Mahout
> another good example of BSP and machine learning.
> And it would of course help me to convince the Mahout team of the usage of
> Apache Hama ;)
>
> If you are interested, I'd be glad to hear from you.
>
> Best regards,
> Thomas
>
> [1]
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java
>
> [2]  http://wiki.apache.org/hama/Benchmarks (scroll down a bit)
>
> Am 1. März 2012 06:31 schrieb Aditya Sarawgi <sa...@gmail.com>:
>
> Hello,
>>
>> I am looking to implement psvm for Mahout as a part of of my coursework.
>> The reference paper is
>> http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
>> and there is a implementation over http://code.google.com/p/psvm/ which
>> uses MPI.
>> Any ideas, pointers are much appreciated.
>>
>> Thanks
>> Aditya Sarawgi
>>
>
>

Re: psvm

Posted by Thomas Jungblut <tj...@apache.org>.

Hi Aditya,

I'm from the Apache Hama team, we are working on a BSP (Bulk Synchronous
Parallel) Engine.
BSP is quite like MPI, just with 2 primitives (barrier sync and message
send), I don't know if it is enough for your algorithm, but I would be very
interested in implementing it with BSP and Apache Hama.

I have already implemented a k-means clustering with BSP [1] which is much
more faster than the MapReduce implementation [2].
I plan to contribute it over the next few months to Mahout, since I think
BSP is a missing part of large scale machine learning (currently I just see
MapReduce implementations everywhere), you would help to give Mahout
another good example of BSP and machine learning.
And it would of course help me to convince the Mahout team of the usage of
Apache Hama ;)

If you are interested, I'd be glad to hear from you.

Best regards,
Thomas

[1]
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java

[2]  http://wiki.apache.org/hama/Benchmarks (scroll down a bit)

Am 1. März 2012 06:31 schrieb Aditya Sarawgi <sa...@gmail.com>:

> Hello,
>
> I am looking to implement psvm for Mahout as a part of of my coursework.
> The reference paper is
> http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
> and there is a implementation over http://code.google.com/p/psvm/ which
> uses MPI.
> Any ideas, pointers are much appreciated.
>
> Thanks
> Aditya Sarawgi
>

Re: psvm

Posted by Aditya Sarawgi <sa...@gmail.com>.

Okay, I see the point now. Let me dig a bit deeper and I will get back soon.
Thanks for the comments.

--
Aditya Sarawgi

On Thu, Mar 1, 2012 at 2:06 AM, Ted Dunning <te...@gmail.com> wrote:

> No.  By linear SVM, I mean SVM that does not use the kernel trick.
>
> This is like logistic regression SGD with a different gradient function.
>  Same idea otherwise.  Yes.  This is a convex problem.
>
>
> On Wed, Feb 29, 2012 at 10:50 PM, Aditya Sarawgi <sarawgi.aditya@gmail.com
> > wrote:
>
>> So if I understand correctly, I think you mean that instead of having
>> multiple layers of svm
>> I just have 1 layer that gets the svm of the individual datasets and in
>> the reducer I get the
>> optimal of all. But is it guaranteed to give a global optima ?
>>
>> On Thu, Mar 1, 2012 at 1:37 AM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> For linear SVM, gradient descent is a fine algorithm.  If you go into
>>> this
>>> work, I would recommend that you implement an all-reduce operation since
>>> iterated map-reduce is very inefficient.
>>>
>>> On Wed, Feb 29, 2012 at 10:30 PM, Aditya Sarawgi
>>> <sa...@gmail.com>wrote:
>>>
>>> > Hi,
>>> >
>>> > Thanks Todd for the pointer. I actually had one more paper in mind,
>>> and its
>>> > from
>>> > the original author of SVM
>>> > http://leon.bottou.org/publications/pdf/nips-2004c.pdf
>>> >
>>> > I think this makes more sense for mapreduce. I am open to other
>>> suggestions
>>> > or
>>> > algorithms.
>>> >
>>> > Thanks
>>> > Aditya Sarawgi
>>> >
>>> > On Thu, Mar 1, 2012 at 1:04 AM, Todd Johnson <jo...@gmail.com>
>>> > wrote:
>>> >
>>> > > The authors of that paper don't believe their algorithm is a good
>>> > candidate
>>> > > for mapreduce. See:
>>> > >
>>> >
>>> http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#
>>> > >
>>> > > todd.
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <
>>> > sarawgi.aditya@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Hello,
>>> > > >
>>> > > > I am looking to implement psvm for Mahout as a part of of my
>>> > coursework.
>>> > > > The reference paper is
>>> > > > http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
>>> > > > and there is a implementation over
>>> http://code.google.com/p/psvm/which
>>> > > > uses MPI.
>>> > > > Any ideas, pointers are much appreciated.
>>> > > >
>>> > > > Thanks
>>> > > > Aditya Sarawgi
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Aditya Sarawgi
>>> >
>>>
>>
>>
>>
>> --
>> Cheers,
>> Aditya Sarawgi
>>
>
>

Re: psvm

Posted by Ted Dunning <te...@gmail.com>.

No.  By linear SVM, I mean SVM that does not use the kernel trick.

This is like logistic regression SGD with a different gradient function.
 Same idea otherwise.  Yes.  This is a convex problem.

On Wed, Feb 29, 2012 at 10:50 PM, Aditya Sarawgi
<sa...@gmail.com>wrote:

> So if I understand correctly, I think you mean that instead of having
> multiple layers of svm
> I just have 1 layer that gets the svm of the individual datasets and in
> the reducer I get the
> optimal of all. But is it guaranteed to give a global optima ?
>
> On Thu, Mar 1, 2012 at 1:37 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> For linear SVM, gradient descent is a fine algorithm.  If you go into this
>> work, I would recommend that you implement an all-reduce operation since
>> iterated map-reduce is very inefficient.
>>
>> On Wed, Feb 29, 2012 at 10:30 PM, Aditya Sarawgi
>> <sa...@gmail.com>wrote:
>>
>> > Hi,
>> >
>> > Thanks Todd for the pointer. I actually had one more paper in mind, and
>> its
>> > from
>> > the original author of SVM
>> > http://leon.bottou.org/publications/pdf/nips-2004c.pdf
>> >
>> > I think this makes more sense for mapreduce. I am open to other
>> suggestions
>> > or
>> > algorithms.
>> >
>> > Thanks
>> > Aditya Sarawgi
>> >
>> > On Thu, Mar 1, 2012 at 1:04 AM, Todd Johnson <jo...@gmail.com>
>> > wrote:
>> >
>> > > The authors of that paper don't believe their algorithm is a good
>> > candidate
>> > > for mapreduce. See:
>> > >
>> >
>> http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#
>> > >
>> > > todd.
>> > >
>> > >
>> > >
>> > > On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <
>> > sarawgi.aditya@gmail.com
>> > > >wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I am looking to implement psvm for Mahout as a part of of my
>> > coursework.
>> > > > The reference paper is
>> > > > http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
>> > > > and there is a implementation over
>> http://code.google.com/p/psvm/which
>> > > > uses MPI.
>> > > > Any ideas, pointers are much appreciated.
>> > > >
>> > > > Thanks
>> > > > Aditya Sarawgi
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Aditya Sarawgi
>> >
>>
>
>
>
> --
> Cheers,
> Aditya Sarawgi
>

Re: psvm

Posted by Aditya Sarawgi <sa...@gmail.com>.

So if I understand correctly, I think you mean that instead of having
multiple layers of svm
I just have 1 layer that gets the svm of the individual datasets and in the
reducer I get the
optimal of all. But is it guaranteed to give a global optima ?

On Thu, Mar 1, 2012 at 1:37 AM, Ted Dunning <te...@gmail.com> wrote:

> For linear SVM, gradient descent is a fine algorithm.  If you go into this
> work, I would recommend that you implement an all-reduce operation since
> iterated map-reduce is very inefficient.
>
> On Wed, Feb 29, 2012 at 10:30 PM, Aditya Sarawgi
> <sa...@gmail.com>wrote:
>
> > Hi,
> >
> > Thanks Todd for the pointer. I actually had one more paper in mind, and
> its
> > from
> > the original author of SVM
> > http://leon.bottou.org/publications/pdf/nips-2004c.pdf
> >
> > I think this makes more sense for mapreduce. I am open to other
> suggestions
> > or
> > algorithms.
> >
> > Thanks
> > Aditya Sarawgi
> >
> > On Thu, Mar 1, 2012 at 1:04 AM, Todd Johnson <jo...@gmail.com>
> > wrote:
> >
> > > The authors of that paper don't believe their algorithm is a good
> > candidate
> > > for mapreduce. See:
> > >
> >
> http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#
> > >
> > > todd.
> > >
> > >
> > >
> > > On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <
> > sarawgi.aditya@gmail.com
> > > >wrote:
> > >
> > > > Hello,
> > > >
> > > > I am looking to implement psvm for Mahout as a part of of my
> > coursework.
> > > > The reference paper is
> > > > http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
> > > > and there is a implementation over
> http://code.google.com/p/psvm/which
> > > > uses MPI.
> > > > Any ideas, pointers are much appreciated.
> > > >
> > > > Thanks
> > > > Aditya Sarawgi
> > > >
> > >
> >
> >
> >
> > --
> > Cheers,
> > Aditya Sarawgi
> >
>



-- 
Cheers,
Aditya Sarawgi

Re: psvm

Posted by Ted Dunning <te...@gmail.com>.

For linear SVM, gradient descent is a fine algorithm.  If you go into this
work, I would recommend that you implement an all-reduce operation since
iterated map-reduce is very inefficient.

On Wed, Feb 29, 2012 at 10:30 PM, Aditya Sarawgi
<sa...@gmail.com>wrote:

> Hi,
>
> Thanks Todd for the pointer. I actually had one more paper in mind, and its
> from
> the original author of SVM
> http://leon.bottou.org/publications/pdf/nips-2004c.pdf
>
> I think this makes more sense for mapreduce. I am open to other suggestions
> or
> algorithms.
>
> Thanks
> Aditya Sarawgi
>
> On Thu, Mar 1, 2012 at 1:04 AM, Todd Johnson <jo...@gmail.com>
> wrote:
>
> > The authors of that paper don't believe their algorithm is a good
> candidate
> > for mapreduce. See:
> >
> http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#
> >
> > todd.
> >
> >
> >
> > On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <
> sarawgi.aditya@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I am looking to implement psvm for Mahout as a part of of my
> coursework.
> > > The reference paper is
> > > http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
> > > and there is a implementation over http://code.google.com/p/psvm/which
> > > uses MPI.
> > > Any ideas, pointers are much appreciated.
> > >
> > > Thanks
> > > Aditya Sarawgi
> > >
> >
>
>
>
> --
> Cheers,
> Aditya Sarawgi
>

Re: psvm

Posted by Aditya Sarawgi <sa...@gmail.com>.

Hi,

Thanks Todd for the pointer. I actually had one more paper in mind, and its
from
the original author of SVM
http://leon.bottou.org/publications/pdf/nips-2004c.pdf

I think this makes more sense for mapreduce. I am open to other suggestions
or
algorithms.

Thanks
Aditya Sarawgi

On Thu, Mar 1, 2012 at 1:04 AM, Todd Johnson <jo...@gmail.com> wrote:

> The authors of that paper don't believe their algorithm is a good candidate
> for mapreduce. See:
> http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#
>
> todd.
>
>
>
> On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <sarawgi.aditya@gmail.com
> >wrote:
>
> > Hello,
> >
> > I am looking to implement psvm for Mahout as a part of of my coursework.
> > The reference paper is
> > http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
> > and there is a implementation over http://code.google.com/p/psvm/ which
> > uses MPI.
> > Any ideas, pointers are much appreciated.
> >
> > Thanks
> > Aditya Sarawgi
> >
>

-- 
Cheers,
Aditya Sarawgi

Re: psvm

Posted by Todd Johnson <jo...@gmail.com>.

The authors of that paper don't believe their algorithm is a good candidate
for mapreduce. See:
http://groups.google.com/group/psvm/browse_thread/thread/cedd3a6caef0f9c9#

todd.

On Wed, Feb 29, 2012 at 9:31 PM, Aditya Sarawgi <sa...@gmail.com>wrote:

> Hello,
>
> I am looking to implement psvm for Mahout as a part of of my coursework.
> The reference paper is
> http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf
> and there is a implementation over http://code.google.com/p/psvm/ which
> uses MPI.
> Any ideas, pointers are much appreciated.
>
> Thanks
> Aditya Sarawgi
>