You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2011/03/03 03:56:55 UTC

To recover document IDs clustered with LDA

Hi

It might seem difficult, but I like to give up after asking this question :)

Is there any method to recover document IDs clustered with LDA? Now, I am
analyzing relationships among several clustering methods, for example, how
many documents are shared between cluster generated by K-means  and one by
DBSCAN. I have to do it with LDA, but LDA is not exactly assigning a
document to a cluster, I don't think it's trivial. But I wan to get how to
approach to this problem.

If it's impossible to directly extract document IDs from LDA topic sets,
what kind of approach do you recommend me?

Thank you

Best, Jay

Re: To recover document IDs clustered with LDA

Posted by Alfred Dimaunahan <al...@fbmsoftware.com>.

I would like to know the answer to his question, it seems it was ignored.

Based on my current understanding, it seems that LDA can't identify which
documents clustered together. If that's the case, after i get the
cluster/topics from a set of documents using LDA, what's a good classifier
to use in order to identify the cluster of each existing document? I need to
do soft clustering, i.e. document 1 is about Cluster 1 and 2. and can the
same classifier be used for new documents?

Thanks.

-Alfred

On Thu, Mar 3, 2011 at 10:56 AM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Hi
>
> It might seem difficult, but I like to give up after asking this question
> :)
>
> Is there any method to recover document IDs clustered with LDA? Now, I am
> analyzing relationships among several clustering methods, for example, how
> many documents are shared between cluster generated by K-means  and one by
> DBSCAN. I have to do it with LDA, but LDA is not exactly assigning a
> document to a cluster, I don't think it's trivial. But I wan to get how to
> approach to this problem.
>
> If it's impossible to directly extract document IDs from LDA topic sets,
> what kind of approach do you recommend me?
>
> Thank you
>
> Best, Jay
>