You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org> on 2011/12/02 23:33:40 UTC

[jira] [Commented] (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

    [ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161884#comment-13161884 ] 

Jake Mannix commented on MAHOUT-399:
------------------------------------

Ah, not sure what happened, but the current trunk LDA is now failing this test, while the new one is not.  Marking the old lda test with @Ignore("MAHOUT-399") to track it for now.
                
> LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-399
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-399
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3, 0.4, 0.5
>         Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
>            Reporter: Michael Lazarus
>            Assignee: Jake Mannix
>              Labels: lda, mahout
>             Fix For: 0.6
>
>         Attachments: 1000docs_26terms_5topics.jpg, MAHOUT-399.diff, Overlapping Pyramids Toy Dataset.pdf, olt.tar
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test Blei's c version of LDA that he posts on his site. It has an exact solution that the LDA should converge to.  Please see attached PDF that describes the intended output.
> Is LDA working?  The following output indicates some sort of collapsing behavior to me.
> T0 	T1 	T2 	T3 	T4
> x 	w 	x 	u 	x
> u 	u 	g 	j 	n
> l 	r 	i 	m 	l
> j 	q 	h 	h 	p
> v 	p 	e 	i 	q
> e 	t 	f 	g 	v
> d 	s 	d 	f 	o
> b 	c 	b 	n 	k
> y 	f 	c 	l 	m
> w 	v 	u 	v 	u
> c 	d 	p 	y 	t
> k 	o 	l 	r 	r
> i 	b 	j 	k 	j
> f 	e 	k 	e 	f
> g 	x 	y 	s 	y
> t 	y 	w 	b 	w
> h 	i 	s 	p 	s
> o 	l 	v 	x 	d
> q 	j 	t 	d 	i
> n 	k 	o 	t 	b
> The intended output is (again, please see attached):
> D 	I 	N 	S 	X
> d 	i 	n 	s 	x
> c 	h 	m 	t 	y
> e 	j 	o 	r 	w
> b 	k 	l 	u 	v
> f 	g 	p 	q 	a
> a 	f 	k 	p 	b
> g 	l 	q 	v 	u
> h 	m 	j 	w 	t
> y 	u 	r 	o 	c
> n 	s 	d 	d 	i
> s 	e 	x 	f 	f
> r 	q 	i 	i 	n
> m 	v 	w 	c 	o
> o 	w 	u 	a 	h
> q 	n 	s 	h 	g
> p 	t 	c 	x 	d
> t 	x 	f 	e 	l
> x 	d 	e 	j 	s
> w 	y 	g 	b 	j
> i 	r 	y 	n 	r
> u 	o 	h 	y 	m
> k 	b 	t 	l 	e
> v 	c 	a 	m 	k
> j 	a 	b 	g 	p
> l 	p 	v 	k 	q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira