You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jake Mannix (Reopened) (JIRA)" <ji...@apache.org> on 2011/11/27 06:24:40 UTC

[jira] [Reopened] (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

     [ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix reopened MAHOUT-399:
--------------------------------

      Assignee: Jake Mannix  (was: Grant Ingersoll)

While it appears that current trunk Mahout LDA correctly converges on this toy problem, I'm reopening this to track the need for this unit test to verify that this is the case.
                
> LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-399
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-399
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3, 0.4, 0.5
>         Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
>            Reporter: Michael Lazarus
>            Assignee: Jake Mannix
>              Labels: lda, mahout
>             Fix For: 0.6
>
>         Attachments: 1000docs_26terms_5topics.jpg, MAHOUT-399.diff, Overlapping Pyramids Toy Dataset.pdf, olt.tar
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test Blei's c version of LDA that he posts on his site. It has an exact solution that the LDA should converge to.  Please see attached PDF that describes the intended output.
> Is LDA working?  The following output indicates some sort of collapsing behavior to me.
> T0 	T1 	T2 	T3 	T4
> x 	w 	x 	u 	x
> u 	u 	g 	j 	n
> l 	r 	i 	m 	l
> j 	q 	h 	h 	p
> v 	p 	e 	i 	q
> e 	t 	f 	g 	v
> d 	s 	d 	f 	o
> b 	c 	b 	n 	k
> y 	f 	c 	l 	m
> w 	v 	u 	v 	u
> c 	d 	p 	y 	t
> k 	o 	l 	r 	r
> i 	b 	j 	k 	j
> f 	e 	k 	e 	f
> g 	x 	y 	s 	y
> t 	y 	w 	b 	w
> h 	i 	s 	p 	s
> o 	l 	v 	x 	d
> q 	j 	t 	d 	i
> n 	k 	o 	t 	b
> The intended output is (again, please see attached):
> D 	I 	N 	S 	X
> d 	i 	n 	s 	x
> c 	h 	m 	t 	y
> e 	j 	o 	r 	w
> b 	k 	l 	u 	v
> f 	g 	p 	q 	a
> a 	f 	k 	p 	b
> g 	l 	q 	v 	u
> h 	m 	j 	w 	t
> y 	u 	r 	o 	c
> n 	s 	d 	d 	i
> s 	e 	x 	f 	f
> r 	q 	i 	i 	n
> m 	v 	w 	c 	o
> o 	w 	u 	a 	h
> q 	n 	s 	h 	g
> p 	t 	c 	x 	d
> t 	x 	f 	e 	l
> x 	d 	e 	j 	s
> w 	y 	g 	b 	j
> i 	r 	y 	n 	r
> u 	o 	h 	y 	m
> k 	b 	t 	l 	e
> v 	c 	a 	m 	k
> j 	a 	b 	g 	p
> l 	p 	v 	k 	q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira