You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2009/10/08 21:05:47 UTC

LDA for multi label classification was: Mahout Book

Posting to the dev list.

Great Paper Thanks!. Looks like L-LDA could be used to create some
interesting examples.
The Paper shows L-LDA could be used to creating word-tag model for accurate
tag(s) prediction given a document of words. I will complete reading and
tell

How much work is need to transform/build on top of current LDA
implementation to L-LDA. any thoughts?

Robin

On Thu, Oct 8, 2009 at 11:50 PM, David Hall <dl...@cs.berkeley.edu> wrote:

> The short answer is, that it probably won't help all that much. Naive
> Bayes is unreasonably good when you have enough data.
>
> The long answer is, I have a paper with Dan Ramage and Ramesh
> Nallapati that talks about how to do it.
>
> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
>
> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
> have more than one class per document. If you have exactly one class
> per document, then LDA reduces to Naive Bayes (or the unsupervised
> variant of naive bayes which is basically k-means in multinomial
> space). If instead you wanted to project W words to K topics, with K >
> numWords, then there is something to do...
>
> That something is something like:
>
> 1) get p(topic|word,document) for each word in each document (which is
> output by LDAInference). Those are your expected counts for each
> topic.
>
> 2)For each class, do something like:
> p(topic|class) \propto  \sum_{document with that class,word}
> p(topic|word,document)
>
> Then just apply bayes rule to do classification:
>
> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
>
> -- David
>
> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <ro...@gmail.com> wrote:
> > Thanks. Didnt see that, Fixed it!.
> > I have a query
> > How is the LDA topic model used to improve a classifier. Say Naive Bayes?
> If
> > its possible, then I would like to integrate it into mahout.
> > Given m classes and the associated documents, One can build m topic
> models
> > right. (set of topics(words) under each label and the associated
> probability
> > distribution of words).
> > How can i use that info weight the most relevant topic of a class ?
> >
> >
>
> >> LDA has two meanings: linear discriminant analysis and latent
> >> dirichlet allocation. My code is the latter. The former is a kind of
> >> classification. You say linear discriminant analysis in the outline.
> >>
>
>

Re: LDA for multi label classification was: Mahout Book

Posted by David Hall <dl...@cs.berkeley.edu>.

On Fri, Oct 16, 2009 at 4:08 AM, zhao zhendong <zh...@gmail.com> wrote:
> I have seen the implementation of L-LDA using Java,
> Stanford Topic Modeling Toolbox <http://nlp.stanford.edu/software/tmt/>
> Does any one know whether they provide the source code or not?

I'm pretty sure it's scala, no? It's definitely open source. Like I
said, however, this implementation is almost certainly Gibbs sampling
based, which has consequences for parallelization (or rather, the
Rao-Blackwellization does.)

-- David
>
> Thanks,
> Maxim
> On Fri, Oct 16, 2009 at 12:39 PM, David Hall <dl...@cs.berkeley.edu> wrote:
>
>> Sorry, this slipped out of my inbox and I just found it!
>>
>> On Thu, Oct 8, 2009 at 12:05 PM, Robin Anil <ro...@gmail.com> wrote:
>> > Posting to the dev list.
>> > Great Paper Thanks!. Looks like L-LDA could be used to create some
>> > interesting examples.
>>
>> Thanks!
>>
>> > The Paper shows L-LDA could be used to creating word-tag model for
>> accurate
>> > tag(s) prediction given a document of words. I will complete reading and
>> > tell
>> > How much work is need to transform/build on top of current LDA
>> > implementation to L-LDA. any thoughts?
>>
>> Umm, cool! In the paper we used Gibbs sampling to do the inference,
>> and the implementation in Mahout uses variational inference (because
>> it distributes better). I don't see any obvious problems in terms of
>> math, and so the rest is just fitting it in the system.
>>
>> I think a small amount of refactoring would be in order to make things
>> more generic, and then it shouldn't be too hard to plug in. I'll add
>> it to my list, but I'm swamped for quite some time.
>>
>> -- David
>>
>> > Robin
>> > On Thu, Oct 8, 2009 at 11:50 PM, David Hall <dl...@cs.berkeley.edu>
>> wrote:
>> >>
>> >> The short answer is, that it probably won't help all that much. Naive
>> >> Bayes is unreasonably good when you have enough data.
>> >>
>> >> The long answer is, I have a paper with Dan Ramage and Ramesh
>> >> Nallapati that talks about how to do it.
>> >>
>> >> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
>> >>
>> >> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
>> >> have more than one class per document. If you have exactly one class
>> >> per document, then LDA reduces to Naive Bayes (or the unsupervised
>> >> variant of naive bayes which is basically k-means in multinomial
>> >> space). If instead you wanted to project W words to K topics, with K >
>> >> numWords, then there is something to do...
>> >>
>> >> That something is something like:
>> >>
>> >> 1) get p(topic|word,document) for each word in each document (which is
>> >> output by LDAInference). Those are your expected counts for each
>> >> topic.
>> >>
>> >> 2)For each class, do something like:
>> >> p(topic|class) \propto  \sum_{document with that class,word}
>> >> p(topic|word,document)
>> >>
>> >> Then just apply bayes rule to do classification:
>> >>
>> >> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
>> >>
>> >> -- David
>> >>
>> >> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <ro...@gmail.com>
>> wrote:
>> >> > Thanks. Didnt see that, Fixed it!.
>> >> > I have a query
>> >> > How is the LDA topic model used to improve a classifier. Say Naive
>> >> > Bayes? If
>> >> > its possible, then I would like to integrate it into mahout.
>> >> > Given m classes and the associated documents, One can build m topic
>> >> > models
>> >> > right. (set of topics(words) under each label and the associated
>> >> > probability
>> >> > distribution of words).
>> >> > How can i use that info weight the most relevant topic of a class ?
>> >> >
>> >> >
>> >>
>> >> >> LDA has two meanings: linear discriminant analysis and latent
>> >> >> dirichlet allocation. My code is the latter. The former is a kind of
>> >> >> classification. You say linear discriminant analysis in the outline.
>> >> >>
>> >>
>> >
>> >
>>
>
>
>
> --
> -------------------------------------------------------------
>
> Zhen-Dong Zhao (Maxim)
>
> <><<><><><><><><><>><><><><><>>>>>>
>
> Department of Computer Science
> School of Computing
> National University of Singapore
>
>><><><><><><><><><><><><><><><><<<<
> Homepage:http://zhaozhendong.googlepages.com
> Mail: zhaozhendong@gmail.com
>>>>>>>><><><><><><><><<><>><><<<<<<
>

Re: LDA for multi label classification was: Mahout Book

Posted by zhao zhendong <zh...@gmail.com>.

I have seen the implementation of L-LDA using Java,
Stanford Topic Modeling Toolbox <http://nlp.stanford.edu/software/tmt/>
Does any one know whether they provide the source code or not?

Thanks,
Maxim
On Fri, Oct 16, 2009 at 12:39 PM, David Hall <dl...@cs.berkeley.edu> wrote:

> Sorry, this slipped out of my inbox and I just found it!
>
> On Thu, Oct 8, 2009 at 12:05 PM, Robin Anil <ro...@gmail.com> wrote:
> > Posting to the dev list.
> > Great Paper Thanks!. Looks like L-LDA could be used to create some
> > interesting examples.
>
> Thanks!
>
> > The Paper shows L-LDA could be used to creating word-tag model for
> accurate
> > tag(s) prediction given a document of words. I will complete reading and
> > tell
> > How much work is need to transform/build on top of current LDA
> > implementation to L-LDA. any thoughts?
>
> Umm, cool! In the paper we used Gibbs sampling to do the inference,
> and the implementation in Mahout uses variational inference (because
> it distributes better). I don't see any obvious problems in terms of
> math, and so the rest is just fitting it in the system.
>
> I think a small amount of refactoring would be in order to make things
> more generic, and then it shouldn't be too hard to plug in. I'll add
> it to my list, but I'm swamped for quite some time.
>
> -- David
>
> > Robin
> > On Thu, Oct 8, 2009 at 11:50 PM, David Hall <dl...@cs.berkeley.edu>
> wrote:
> >>
> >> The short answer is, that it probably won't help all that much. Naive
> >> Bayes is unreasonably good when you have enough data.
> >>
> >> The long answer is, I have a paper with Dan Ramage and Ramesh
> >> Nallapati that talks about how to do it.
> >>
> >> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
> >>
> >> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
> >> have more than one class per document. If you have exactly one class
> >> per document, then LDA reduces to Naive Bayes (or the unsupervised
> >> variant of naive bayes which is basically k-means in multinomial
> >> space). If instead you wanted to project W words to K topics, with K >
> >> numWords, then there is something to do...
> >>
> >> That something is something like:
> >>
> >> 1) get p(topic|word,document) for each word in each document (which is
> >> output by LDAInference). Those are your expected counts for each
> >> topic.
> >>
> >> 2)For each class, do something like:
> >> p(topic|class) \propto  \sum_{document with that class,word}
> >> p(topic|word,document)
> >>
> >> Then just apply bayes rule to do classification:
> >>
> >> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
> >>
> >> -- David
> >>
> >> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <ro...@gmail.com>
> wrote:
> >> > Thanks. Didnt see that, Fixed it!.
> >> > I have a query
> >> > How is the LDA topic model used to improve a classifier. Say Naive
> >> > Bayes? If
> >> > its possible, then I would like to integrate it into mahout.
> >> > Given m classes and the associated documents, One can build m topic
> >> > models
> >> > right. (set of topics(words) under each label and the associated
> >> > probability
> >> > distribution of words).
> >> > How can i use that info weight the most relevant topic of a class ?
> >> >
> >> >
> >>
> >> >> LDA has two meanings: linear discriminant analysis and latent
> >> >> dirichlet allocation. My code is the latter. The former is a kind of
> >> >> classification. You say linear discriminant analysis in the outline.
> >> >>
> >>
> >
> >
>



-- 
-------------------------------------------------------------

Zhen-Dong Zhao (Maxim)

<><<><><><><><><><>><><><><><>>>>>>

Department of Computer Science
School of Computing
National University of Singapore

><><><><><><><><><><><><><><><><<<<
Homepage:http://zhaozhendong.googlepages.com
Mail: zhaozhendong@gmail.com
>>>>>>><><><><><><><><<><>><><<<<<<

Re: LDA for multi label classification was: Mahout Book

Posted by David Hall <dl...@cs.berkeley.edu>.

Sorry, this slipped out of my inbox and I just found it!

On Thu, Oct 8, 2009 at 12:05 PM, Robin Anil <ro...@gmail.com> wrote:
> Posting to the dev list.
> Great Paper Thanks!. Looks like L-LDA could be used to create some
> interesting examples.

Thanks!

> The Paper shows L-LDA could be used to creating word-tag model for accurate
> tag(s) prediction given a document of words. I will complete reading and
> tell
> How much work is need to transform/build on top of current LDA
> implementation to L-LDA. any thoughts?

Umm, cool! In the paper we used Gibbs sampling to do the inference,
and the implementation in Mahout uses variational inference (because
it distributes better). I don't see any obvious problems in terms of
math, and so the rest is just fitting it in the system.

I think a small amount of refactoring would be in order to make things
more generic, and then it shouldn't be too hard to plug in. I'll add
it to my list, but I'm swamped for quite some time.

-- David

> Robin
> On Thu, Oct 8, 2009 at 11:50 PM, David Hall <dl...@cs.berkeley.edu> wrote:
>>
>> The short answer is, that it probably won't help all that much. Naive
>> Bayes is unreasonably good when you have enough data.
>>
>> The long answer is, I have a paper with Dan Ramage and Ramesh
>> Nallapati that talks about how to do it.
>>
>> www.aclweb.org/anthology-new/D/D09/D09-1026.pdf
>>
>> In some sense, "Labeled-LDA" is a kind of Naive Bayes where you can
>> have more than one class per document. If you have exactly one class
>> per document, then LDA reduces to Naive Bayes (or the unsupervised
>> variant of naive bayes which is basically k-means in multinomial
>> space). If instead you wanted to project W words to K topics, with K >
>> numWords, then there is something to do...
>>
>> That something is something like:
>>
>> 1) get p(topic|word,document) for each word in each document (which is
>> output by LDAInference). Those are your expected counts for each
>> topic.
>>
>> 2)For each class, do something like:
>> p(topic|class) \propto  \sum_{document with that class,word}
>> p(topic|word,document)
>>
>> Then just apply bayes rule to do classification:
>>
>> p(class|topics,document) \propto p(class) \prod p(topic|class,document)
>>
>> -- David
>>
>> On Thu, Oct 8, 2009 at 11:07 AM, Robin Anil <ro...@gmail.com> wrote:
>> > Thanks. Didnt see that, Fixed it!.
>> > I have a query
>> > How is the LDA topic model used to improve a classifier. Say Naive
>> > Bayes? If
>> > its possible, then I would like to integrate it into mahout.
>> > Given m classes and the associated documents, One can build m topic
>> > models
>> > right. (set of topics(words) under each label and the associated
>> > probability
>> > distribution of words).
>> > How can i use that info weight the most relevant topic of a class ?
>> >
>> >
>>
>> >> LDA has two meanings: linear discriminant analysis and latent
>> >> dirichlet allocation. My code is the latter. The former is a kind of
>> >> classification. You say linear discriminant analysis in the outline.
>> >>
>>
>
>