You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/07/01 05:28:44 UTC

Re: questions on the results of running lda and ldatopics, thanks

I think this requires a separate program which does not exist.

On Thu, Jun 30, 2011 at 12:02 PM, wine lover <wi...@gmail.com> wrote:
> Thanks, Hector, you are right, the exact meaning of topic_i is not necessary
> for unsupervised clustering.
>
> However, in order to cluster a set of documents, I still need to know the
> probabilistic relationship between topic and each document. I am not very
> clear how to get this kind of information from the generated result.
>
> For instance, model [p(model|topic_0) = 0.010358664102351409  Here, model is
> a word, but the result does not tell me anything between this word and a
> given document? Thanks.
>
>
> On Thu, Jun 30, 2011 at 2:08 PM, wine lover <wi...@gmail.com> wrote:
>
>> Hello Everyone,
>>
>> I have two questions on the LDA analysis.
>>
>> After running the command of lda, under the generated directory of
>> "testdata-lda", there have several folders: docTopics  state-0   state-1
>> ....
>>
>> It seems to me that those folders of "state-x" will be transferred into
>> readable format after running "ldatopics". But what does the folder of
>> "docTopics" stand for? How can I view it?
>>
>> Running the command of ldatopics generates 20 files, (topic_0, topic_1,
>> etc), in total. For instance, in the file of topic_0, I get information such
>> as follows:
>> model [p(model|topic_0) = 0.010358664102351409
>> tissues [p(tissues|topic_0) = 0.008870984984037485
>>
>> How can I tell what does topic_0 stand for? Where to find this kind of
>> information?  Moreover, is there any other procedures existed to generate
>> the clustering result based on these topic_x files.
>>
>>
>> Thank you very much for the help.
>>
>> Wenyia
>>
>



-- 
Lance Norskog
goksron@gmail.com