You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Lorenz Fischer <lo...@gmail.com> on 2015/06/03 15:43:13 UTC

MLlib: Anybody working on hierarchical topic models like HLDA?

Hi All

I'm working on a project in which I use the current LDA implementation that
has been contributed by Databricks' Joseph Bradley et al. for the recent
1.3.0 release (thanks guys!). While this is great, my project requires
several levels of topics, as I would like to offer users to drill down into
subtopics.

As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would
offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and
Jordan [3], I think I should be able to implement HLDA in Spark using the
Nested Chinese Restaurant Process (NCRP). However, as I have some time
constraints, I'm not sure if I will have the time to do it 'the proper way'.

In any case, I wanted to quickly ask around if anybody is already working
on this or on some other form of a hierarchical topic model. Maybe I could
contribute to these efforts instead of starting from scratch.

Best,
Lorenz

[1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
[2]
http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
[3] https://www.youtube.com/watch?v=PxgW3lOrj60

RE: MLlib: Anybody working on hierarchical topic models like HLDA?

Posted by "Yang, Yuhao" <yu...@intel.com>.
Hi DB Tsai,

Not for now. My primary reference is http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf .

And I'm seeking a way to maximum code reuse. Any suggestion will be welcome. Thanks.

Regards,
yuhao

-----Original Message-----
From: DB Tsai [mailto:dbtsai@dbtsai.com] 
Sent: Thursday, June 4, 2015 1:01 PM
To: Yang, Yuhao
Cc: Joseph Bradley; Lorenz Fischer; dev@spark.apache.org
Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Is your HDP implementation based on distributed gibbs sampling? Thanks.

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com


On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao <yu...@intel.com> wrote:
> Hi Lorenz,
>
>
>
>   I’m trying to build a prototype of HDP for a customer based on the 
> current LDA implementations. An initial version will probably be ready 
> within the next one or two weeks. I’ll share it and hopefully we can join forces.
>
>
>
>   One concern is that I’m not sure how widely it will be used in the 
> industry or community. Hope it’s popular enough to be accepted by 
> Spark MLlib.
>
>
>
> http://www.cs.berkeley.edu/~jordan/papers/hierarchical-dp.pdf
>
> http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf
>
>
>
> Regards,
>
> Yuhao
>
>
>
> From: Joseph Bradley [mailto:joseph@databricks.com]
> Sent: Thursday, June 4, 2015 7:17 AM
> To: Lorenz Fischer
> Cc: dev@spark.apache.org
> Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?
>
>
>
> Hi Lorenz,
>
>
>
> I'm not aware of people working on hierarchical topic models for 
> MLlib, but that would be cool to see.  Hopefully other devs know more!
>
>
>
> Glad that the current LDA is helpful!
>
>
>
> Joseph
>
>
>
> On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer 
> <lo...@gmail.com>
> wrote:
>
> Hi All
>
>
>
> I'm working on a project in which I use the current LDA implementation 
> that has been contributed by Databricks' Joseph Bradley et al. for the 
> recent
> 1.3.0 release (thanks guys!). While this is great, my project requires 
> several levels of topics, as I would like to offer users to drill down 
> into subtopics.
>
>
>
> As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) 
> would offer such a hierarchy. Looking at the papers and talks by Blei 
> [1,2] and Jordan [3], I think I should be able to implement HLDA in 
> Spark using the Nested Chinese Restaurant Process (NCRP). However, as 
> I have some time constraints, I'm not sure if I will have the time to do it 'the proper way'.
>
>
>
> In any case, I wanted to quickly ask around if anybody is already 
> working on this or on some other form of a hierarchical topic model. 
> Maybe I could contribute to these efforts instead of starting from scratch.
>
>
>
> Best,
>
> Lorenz
>
>
>
> [1] 
> http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
>
> [2]
> http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nes
> ted-chinese-restaurant-process.pdf
>
> [3] https://www.youtube.com/watch?v=PxgW3lOrj60
>
>

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Posted by DB Tsai <db...@dbtsai.com>.
Is your HDP implementation based on distributed gibbs sampling? Thanks.

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com


On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao <yu...@intel.com> wrote:
> Hi Lorenz,
>
>
>
>   I’m trying to build a prototype of HDP for a customer based on the current
> LDA implementations. An initial version will probably be ready within the
> next one or two weeks. I’ll share it and hopefully we can join forces.
>
>
>
>   One concern is that I’m not sure how widely it will be used in the
> industry or community. Hope it’s popular enough to be accepted by Spark
> MLlib.
>
>
>
> http://www.cs.berkeley.edu/~jordan/papers/hierarchical-dp.pdf
>
> http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf
>
>
>
> Regards,
>
> Yuhao
>
>
>
> From: Joseph Bradley [mailto:joseph@databricks.com]
> Sent: Thursday, June 4, 2015 7:17 AM
> To: Lorenz Fischer
> Cc: dev@spark.apache.org
> Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?
>
>
>
> Hi Lorenz,
>
>
>
> I'm not aware of people working on hierarchical topic models for MLlib, but
> that would be cool to see.  Hopefully other devs know more!
>
>
>
> Glad that the current LDA is helpful!
>
>
>
> Joseph
>
>
>
> On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer <lo...@gmail.com>
> wrote:
>
> Hi All
>
>
>
> I'm working on a project in which I use the current LDA implementation that
> has been contributed by Databricks' Joseph Bradley et al. for the recent
> 1.3.0 release (thanks guys!). While this is great, my project requires
> several levels of topics, as I would like to offer users to drill down into
> subtopics.
>
>
>
> As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would
> offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and
> Jordan [3], I think I should be able to implement HLDA in Spark using the
> Nested Chinese Restaurant Process (NCRP). However, as I have some time
> constraints, I'm not sure if I will have the time to do it 'the proper way'.
>
>
>
> In any case, I wanted to quickly ask around if anybody is already working on
> this or on some other form of a hierarchical topic model. Maybe I could
> contribute to these efforts instead of starting from scratch.
>
>
>
> Best,
>
> Lorenz
>
>
>
> [1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
>
> [2]
> http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
>
> [3] https://www.youtube.com/watch?v=PxgW3lOrj60
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


RE: MLlib: Anybody working on hierarchical topic models like HLDA?

Posted by "Yang, Yuhao" <yu...@intel.com>.
Hi Lorenz,

  I’m trying to build a prototype of HDP for a customer based on the current LDA implementations. An initial version will probably be ready within the next one or two weeks. I’ll share it and hopefully we can join forces.

  One concern is that I’m not sure how widely it will be used in the industry or community. Hope it’s popular enough to be accepted by Spark MLlib.

http://www.cs.berkeley.edu/~jordan/papers/hierarchical-dp.pdf
http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf

Regards,
Yuhao

From: Joseph Bradley [mailto:joseph@databricks.com]
Sent: Thursday, June 4, 2015 7:17 AM
To: Lorenz Fischer
Cc: dev@spark.apache.org
Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Hi Lorenz,

I'm not aware of people working on hierarchical topic models for MLlib, but that would be cool to see.  Hopefully other devs know more!

Glad that the current LDA is helpful!

Joseph

On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer <lo...@gmail.com>> wrote:
Hi All

I'm working on a project in which I use the current LDA implementation that has been contributed by Databricks' Joseph Bradley et al. for the recent 1.3.0 release (thanks guys!). While this is great, my project requires several levels of topics, as I would like to offer users to drill down into subtopics.

As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and Jordan [3], I think I should be able to implement HLDA in Spark using the Nested Chinese Restaurant Process (NCRP). However, as I have some time constraints, I'm not sure if I will have the time to do it 'the proper way'.

In any case, I wanted to quickly ask around if anybody is already working on this or on some other form of a hierarchical topic model. Maybe I could contribute to these efforts instead of starting from scratch.

Best,
Lorenz

[1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
[2] http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
[3] https://www.youtube.com/watch?v=PxgW3lOrj60


Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Posted by Joseph Bradley <jo...@databricks.com>.
Hi Lorenz,

I'm not aware of people working on hierarchical topic models for MLlib, but
that would be cool to see.  Hopefully other devs know more!

Glad that the current LDA is helpful!

Joseph

On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer <lo...@gmail.com>
wrote:

> Hi All
>
> I'm working on a project in which I use the current LDA implementation
> that has been contributed by Databricks' Joseph Bradley et al. for the
> recent 1.3.0 release (thanks guys!). While this is great, my project
> requires several levels of topics, as I would like to offer users to drill
> down into subtopics.
>
> As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would
> offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and
> Jordan [3], I think I should be able to implement HLDA in Spark using the
> Nested Chinese Restaurant Process (NCRP). However, as I have some time
> constraints, I'm not sure if I will have the time to do it 'the proper way'.
>
> In any case, I wanted to quickly ask around if anybody is already working
> on this or on some other form of a hierarchical topic model. Maybe I could
> contribute to these efforts instead of starting from scratch.
>
> Best,
> Lorenz
>
> [1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
> [2]
> http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
> [3] https://www.youtube.com/watch?v=PxgW3lOrj60
>