You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by prasenjit mukherjee <pm...@quattrowireless.com> on 2009/12/02 10:43:49 UTC

Using Restricted Boltzmann for clustering

Just curious to know  if anyone has used ( or have knowledge of  using
)  Restricted Boltzmann for clustering. ( Could be obvious to most of
ML experts )  I am seeing some similarity between SVD/RBM as I attempt
to interpret the weightmatrix ( between hidden/visible neurons ) as
U/V matrix vectors.   Do let me know if I am way off.

-Prasen

Re: Using Restricted Boltzmann for clustering

Posted by Jake Mannix <ja...@gmail.com>.

On Fri, Dec 4, 2009 at 8:07 PM, prasenjit mukherjee <
pmukherjee@quattrowireless.com> wrote:

> I am indeed learning via CD technique, but using only a single layer,
> where the computation of neurons toggles between the same set of
> visible and hidden layers. Guess I was too optimistic and was
> expecting results  in first RBM itself.
>

  Have you read Semantic
Hashing<http://www.cs.utoronto.ca/%7Ehinton/absps/sh.pdf>.
(pdf link) (Salakhutdinov and Hinton) ?  He gives
a good explanation of why the multi-layered approach is so necessary.  One
layer
of an RBM is not much more than a simple old-fashioned single-layer neural
net,
which from what I remember, has never been that good at doing this kind of
thing.
Not only are multiple layers needed, but fine tuning by old-fashioned
gradient
descent after the CD pretraining is also necessary.

  -jake

I believe in stacked RBMs you repeat the same thing for more than one
> layer, where results of current hidden layers get passed on as visible
> layer to a new RBM with new hidden layer and you again do Contrastive
> Divergence in that new RBM. You could possibly have different number
> of hidden neurons at each layer. Makes sense to keep reducing the
> number of hidden neurons at each subsequent layers.
>
> Anyways, will see if I can try out with multiple layers and see any
> improvements.
>
> -Thanks,
> Prasen
>
> On Fri, Dec 4, 2009 at 11:25 PM, Jake Mannix <ja...@gmail.com>
> wrote:
> > Prasen,
> >
> >  I thought the whole point of doing the RBM approach to autoencoders /
> > dimensional
> > reduction was to do the stacked approach, since you don't need to do full
> > convergence
> > layer by layer, but instead do the layer-by-layer "contrastive
> divergence"
> > technique
> > which Hinton advocates, and then do fine-tuning at the end?  I wouldn't
> > imagine
> > you'd get very good relevance on a single layer.
> >
> >  -jake
> >
> > On Fri, Dec 4, 2009 at 8:37 AM, prasenjit mukherjee <
> > pmukherjee@quattrowireless.com> wrote:
> >
> >> I did try out on some sample data where my visible layer was Linear
> >> and hidden layer was StochasticBinary.   Using a single layer RBM
> >> didnt give me great results. I guess I should try out the stacked RBM
> >> approach.
> >>
> >> BTW, Anybody used single layer RBM on a doc X term probability matrix
> >> ( aka Continuous visible layer )  with values 0-1 for collaborative
> >> filtering ?
> >>
> >> -Prasen
> >>
> >> On Thu, Dec 3, 2009 at 12:40 AM, Olivier Grisel
> >> <ol...@ensta.org> wrote:
> >> > 2009/12/2 Jake Mannix <ja...@gmail.com>:
> >> >> Prasen,
> >> >>
> >> >>  I was just talking about this on here last week.  Yes, RBM-based
> >> >> clustering can be viewed as
> >> >> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do
> >> you
> >> >> have any RBM code you
> >> >> care to contribute to Mahout?
> >> >
> >> > Hi,
> >> >
> >> > I have some C + python code for stacking autoencoders which share
> >> > similar features as DBN (stacked RBM) here:
> >> > http://bitbucket.org/ogrisel/libsgd/wiki/Home
> >> >
> >> > This is still pretty much work in progress, I will let you know when I
> >> > have easy to run sample demos.
> >> >
> >> > However, this algo is not trivially mapreducable but I plan to
> >> > investigate on that matters in the coming weeks. Would be nice to have
> >> > a pure JVM version too. I am also planning to play with clojure +
> >> > incanter (with the parallelcolt library as a backend for linear
> >> > algebra) to make it easier to work with Hadoop.
> >> >
> >> > --
> >> > Olivier
> >> > http://twitter.com/ogrisel - http://code.oliviergrisel.name
> >> >
> >>
> >
>

Re: Using Restricted Boltzmann for clustering

Posted by prasenjit mukherjee <pm...@quattrowireless.com>.

I am indeed learning via CD technique, but using only a single layer,
where the computation of neurons toggles between the same set of
visible and hidden layers. Guess I was too optimistic and was
expecting results  in first RBM itself.

I believe in stacked RBMs you repeat the same thing for more than one
layer, where results of current hidden layers get passed on as visible
layer to a new RBM with new hidden layer and you again do Contrastive
Divergence in that new RBM. You could possibly have different number
of hidden neurons at each layer. Makes sense to keep reducing the
number of hidden neurons at each subsequent layers.

Anyways, will see if I can try out with multiple layers and see any
improvements.

-Thanks,
Prasen

On Fri, Dec 4, 2009 at 11:25 PM, Jake Mannix <ja...@gmail.com> wrote:
> Prasen,
>
>  I thought the whole point of doing the RBM approach to autoencoders /
> dimensional
> reduction was to do the stacked approach, since you don't need to do full
> convergence
> layer by layer, but instead do the layer-by-layer "contrastive divergence"
> technique
> which Hinton advocates, and then do fine-tuning at the end?  I wouldn't
> imagine
> you'd get very good relevance on a single layer.
>
>  -jake
>
> On Fri, Dec 4, 2009 at 8:37 AM, prasenjit mukherjee <
> pmukherjee@quattrowireless.com> wrote:
>
>> I did try out on some sample data where my visible layer was Linear
>> and hidden layer was StochasticBinary.   Using a single layer RBM
>> didnt give me great results. I guess I should try out the stacked RBM
>> approach.
>>
>> BTW, Anybody used single layer RBM on a doc X term probability matrix
>> ( aka Continuous visible layer )  with values 0-1 for collaborative
>> filtering ?
>>
>> -Prasen
>>
>> On Thu, Dec 3, 2009 at 12:40 AM, Olivier Grisel
>> <ol...@ensta.org> wrote:
>> > 2009/12/2 Jake Mannix <ja...@gmail.com>:
>> >> Prasen,
>> >>
>> >>  I was just talking about this on here last week.  Yes, RBM-based
>> >> clustering can be viewed as
>> >> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do
>> you
>> >> have any RBM code you
>> >> care to contribute to Mahout?
>> >
>> > Hi,
>> >
>> > I have some C + python code for stacking autoencoders which share
>> > similar features as DBN (stacked RBM) here:
>> > http://bitbucket.org/ogrisel/libsgd/wiki/Home
>> >
>> > This is still pretty much work in progress, I will let you know when I
>> > have easy to run sample demos.
>> >
>> > However, this algo is not trivially mapreducable but I plan to
>> > investigate on that matters in the coming weeks. Would be nice to have
>> > a pure JVM version too. I am also planning to play with clojure +
>> > incanter (with the parallelcolt library as a backend for linear
>> > algebra) to make it easier to work with Hadoop.
>> >
>> > --
>> > Olivier
>> > http://twitter.com/ogrisel - http://code.oliviergrisel.name
>> >
>>
>

Re: Using Restricted Boltzmann for clustering

Posted by Jake Mannix <ja...@gmail.com>.

Prasen,

  I thought the whole point of doing the RBM approach to autoencoders /
dimensional
reduction was to do the stacked approach, since you don't need to do full
convergence
layer by layer, but instead do the layer-by-layer "contrastive divergence"
technique
which Hinton advocates, and then do fine-tuning at the end?  I wouldn't
imagine
you'd get very good relevance on a single layer.

  -jake

On Fri, Dec 4, 2009 at 8:37 AM, prasenjit mukherjee <
pmukherjee@quattrowireless.com> wrote:

> I did try out on some sample data where my visible layer was Linear
> and hidden layer was StochasticBinary.   Using a single layer RBM
> didnt give me great results. I guess I should try out the stacked RBM
> approach.
>
> BTW, Anybody used single layer RBM on a doc X term probability matrix
> ( aka Continuous visible layer )  with values 0-1 for collaborative
> filtering ?
>
> -Prasen
>
> On Thu, Dec 3, 2009 at 12:40 AM, Olivier Grisel
> <ol...@ensta.org> wrote:
> > 2009/12/2 Jake Mannix <ja...@gmail.com>:
> >> Prasen,
> >>
> >>  I was just talking about this on here last week.  Yes, RBM-based
> >> clustering can be viewed as
> >> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do
> you
> >> have any RBM code you
> >> care to contribute to Mahout?
> >
> > Hi,
> >
> > I have some C + python code for stacking autoencoders which share
> > similar features as DBN (stacked RBM) here:
> > http://bitbucket.org/ogrisel/libsgd/wiki/Home
> >
> > This is still pretty much work in progress, I will let you know when I
> > have easy to run sample demos.
> >
> > However, this algo is not trivially mapreducable but I plan to
> > investigate on that matters in the coming weeks. Would be nice to have
> > a pure JVM version too. I am also planning to play with clojure +
> > incanter (with the parallelcolt library as a backend for linear
> > algebra) to make it easier to work with Hadoop.
> >
> > --
> > Olivier
> > http://twitter.com/ogrisel - http://code.oliviergrisel.name
> >
>

Re: Using Restricted Boltzmann for clustering

Posted by prasenjit mukherjee <pm...@quattrowireless.com>.

I did try out on some sample data where my visible layer was Linear
and hidden layer was StochasticBinary.   Using a single layer RBM
didnt give me great results. I guess I should try out the stacked RBM
approach.

BTW, Anybody used single layer RBM on a doc X term probability matrix
( aka Continuous visible layer )  with values 0-1 for collaborative
filtering ?

-Prasen

On Thu, Dec 3, 2009 at 12:40 AM, Olivier Grisel
<ol...@ensta.org> wrote:
> 2009/12/2 Jake Mannix <ja...@gmail.com>:
>> Prasen,
>>
>>  I was just talking about this on here last week.  Yes, RBM-based
>> clustering can be viewed as
>> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
>> have any RBM code you
>> care to contribute to Mahout?
>
> Hi,
>
> I have some C + python code for stacking autoencoders which share
> similar features as DBN (stacked RBM) here:
> http://bitbucket.org/ogrisel/libsgd/wiki/Home
>
> This is still pretty much work in progress, I will let you know when I
> have easy to run sample demos.
>
> However, this algo is not trivially mapreducable but I plan to
> investigate on that matters in the coming weeks. Would be nice to have
> a pure JVM version too. I am also planning to play with clojure +
> incanter (with the parallelcolt library as a backend for linear
> algebra) to make it easier to work with Hadoop.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>

Re: Using Restricted Boltzmann for clustering

Posted by prasenjit mukherjee <pm...@quattrowireless.com>.

I am using http://sourceforge.net/projects/jarbm. Not sure how easily
the algorithms can be mapreducable.

Great to hear similar view point. Although a bit too early, but it
seems that the -ve weights ( in RBM ) do have a better interpretation
which is not there in the SVDs.  If you consider each hidden neuron as
a cluster the -ve weights tend to specify the denial of a partciular
cluster if that feature is present.

Again, these are just some observations on a preliminary set of data,
and would definitely appreciate any kind of supporting theory.

-Prasen

On Thu, Dec 3, 2009 at 12:40 AM, Olivier Grisel
<ol...@ensta.org> wrote:
> 2009/12/2 Jake Mannix <ja...@gmail.com>:
>> Prasen,
>>
>>  I was just talking about this on here last week.  Yes, RBM-based
>> clustering can be viewed as
>> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
>> have any RBM code you
>> care to contribute to Mahout?
>
> Hi,
>
> I have some C + python code for stacking autoencoders which share
> similar features as DBN (stacked RBM) here:
> http://bitbucket.org/ogrisel/libsgd/wiki/Home
>
> This is still pretty much work in progress, I will let you know when I
> have easy to run sample demos.
>
> However, this algo is not trivially mapreducable but I plan to
> investigate on that matters in the coming weeks. Would be nice to have
> a pure JVM version too. I am also planning to play with clojure +
> incanter (with the parallelcolt library as a backend for linear
> algebra) to make it easier to work with Hadoop.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://code.oliviergrisel.name
>

Re: Using Restricted Boltzmann for clustering

Posted by Olivier Grisel <ol...@ensta.org>.

2009/12/2 Jake Mannix <ja...@gmail.com>:
> Prasen,
>
>  I was just talking about this on here last week.  Yes, RBM-based
> clustering can be viewed as
> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
> have any RBM code you
> care to contribute to Mahout?

Hi,

I have some C + python code for stacking autoencoders which share
similar features as DBN (stacked RBM) here:
http://bitbucket.org/ogrisel/libsgd/wiki/Home

This is still pretty much work in progress, I will let you know when I
have easy to run sample demos.

However, this algo is not trivially mapreducable but I plan to
investigate on that matters in the coming weeks. Would be nice to have
a pure JVM version too. I am also planning to play with clojure +
incanter (with the parallelcolt library as a backend for linear
algebra) to make it easier to work with Hadoop.

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Re: Using Restricted Boltzmann for clustering

Posted by Jake Mannix <ja...@gmail.com>.

Prasen,

  Geoff Hinton has given quite a few great talks on using RBMs as autocoders
over the past
couple of years, and some of them can be found on YouTube, while others have
ppt/pdfs on his
website, and while I'm not sure which one to recommend which goes into the
specifics of the
relationship with this and other forms of dimensional reduction, but he
certainly mentions it in
his Science article (2006) <http://www.cs.toronto.edu/%7Ehinton/science.pdf>,
available on his website <http://www.cs.toronto.edu/%7Ehinton>.

  -jake

On Wed, Dec 2, 2009 at 7:29 PM, prasenjit mukherjee <
pmukherjee@quattrowireless.com> wrote:

> This is interesting and what I was expecting. Any articles/ppt/talks
> you can refer on the theory of  connection between SVD/RBMs
>
> On Wed, Dec 2, 2009 at 9:23 PM, Jake Mannix <ja...@gmail.com> wrote:
> > Prasen,
> >
> >  I was just talking about this on here last week.  Yes, RBM-based
> > clustering can be viewed as
> > a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
> > have any RBM code you
> > care to contribute to Mahout?
> >
>

Re: Using Restricted Boltzmann for clustering

Posted by prasenjit mukherjee <pm...@quattrowireless.com>.

This is interesting and what I was expecting. Any articles/ppt/talks
you can refer on the theory of  connection between SVD/RBMs

On Wed, Dec 2, 2009 at 9:23 PM, Jake Mannix <ja...@gmail.com> wrote:
> Prasen,
>
>  I was just talking about this on here last week.  Yes, RBM-based
> clustering can be viewed as
> a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
> have any RBM code you
> care to contribute to Mahout?
>

Re: Using Restricted Boltzmann for clustering

Posted by Jake Mannix <ja...@gmail.com>.

Prasen,

  I was just talking about this on here last week.  Yes, RBM-based
clustering can be viewed as
a nonlinear SVD.  I'm pretty interested in your findings on this.  Do you
have any RBM code you
care to contribute to Mahout?

  -jake

On Wed, Dec 2, 2009 at 1:43 AM, prasenjit mukherjee <
pmukherjee@quattrowireless.com> wrote:

> Just curious to know  if anyone has used ( or have knowledge of  using
> )  Restricted Boltzmann for clustering. ( Could be obvious to most of
> ML experts )  I am seeing some similarity between SVD/RBM as I attempt
> to interpret the weightmatrix ( between hidden/visible neurons ) as
> U/V matrix vectors.   Do let me know if I am way off.
>
> -Prasen
>