You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org> on 2011/11/01 17:59:32 UTC

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

    [ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141321#comment-13141321 ] 

Grant Ingersoll commented on MAHOUT-857:
----------------------------------------

Here's the conf. matrix I'm getting, which clearly points to some idiocy on my part:
{quote}

7532 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :        374	    4.9655%
Incorrectly Classified Instances        :       7158	   95.0345%
Total Classified Instances              :       7532

=======================================================
Confusion Matrix
-------------------------------------------------------
a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n    	o    	p    	q    	r    	s    	t    	u    	<--Classified as
123  	0    	1    	1    	1    	2    	6    	19   	2    	2    	5    	23   	27   	8    	53   	3    	14   	17   	12   	0    	0    	 |  319   	a     = alt.atheism
55   	16   	28   	14   	80   	24   	3    	8    	4    	3    	8    	86   	27   	28   	0    	2    	3    	0    	0    	0    	0    	 |  389   	b     = comp.graphics
38   	171  	57   	14   	49   	5    	3    	6    	2    	4    	3    	25   	7    	6    	1    	1    	0    	2    	0    	0    	0    	 |  394   	c     = comp.os.ms-windows.misc
10   	14   	237  	18   	17   	15   	2    	7    	4    	0    	2    	54   	7    	4    	0    	0    	0    	1    	0    	0    	0    	 |  392   	d     = comp.sys.ibm.pc.hardware
20   	10   	55   	159  	17   	20   	7    	11   	5    	0    	1    	63   	13   	2    	0    	1    	0    	1    	0    	0    	0    	 |  385   	e     = comp.sys.mac.hardware
11   	25   	5    	0    	306  	13   	3    	1    	0    	5    	2    	13   	5    	6    	0    	0    	0    	0    	0    	0    	0    	 |  395   	f     = comp.windows.x
2    	1    	23   	14   	6    	310  	1    	3    	3    	1    	1    	10   	6    	5    	0    	3    	0    	1    	0    	0    	0    	 |  390   	g     = misc.forsale
8    	1    	6    	2    	9    	11   	270  	15   	10   	3    	3    	37   	11   	4    	0    	2    	0    	4    	0    	0    	0    	 |  396   	h     = rec.autos
7    	0    	1    	1    	8    	6    	14   	326  	1    	0    	1    	12   	17   	3    	1    	0    	0    	0    	0    	0    	0    	 |  398   	i     = rec.motorcycles
17   	1    	2    	1    	2    	5    	2    	7    	295  	26   	1    	16   	12   	2    	0    	2    	3    	3    	0    	0    	0    	 |  397   	j     = rec.sport.baseball
6    	1    	0    	0    	1    	3    	3    	6    	55   	291  	1    	7    	4    	14   	2    	4    	1    	0    	0    	0    	0    	 |  399   	k     = rec.sport.hockey
22   	2    	0    	3    	5    	3    	0    	3    	2    	1    	293  	24   	12   	7    	0    	4    	2    	13   	0    	0    	0    	 |  396   	l     = sci.crypt
25   	6    	23   	13   	15   	11   	10   	18   	4    	3    	13   	212  	18   	16   	2    	1    	1    	2    	0    	0    	0    	 |  393   	m     = sci.electronics
14   	4    	5    	2    	5    	7    	2    	17   	7    	3    	0    	38   	268  	11   	4    	3    	4    	2    	0    	0    	0    	 |  396   	n     = sci.med
22   	1    	0    	1    	3    	4    	0    	8    	1    	4    	2    	34   	26   	279  	0    	2    	2    	5    	0    	0    	0    	 |  394   	o     = sci.space
43   	1    	2    	4    	0    	4    	1    	11   	4    	1    	0    	9    	33   	8    	249  	2    	5    	14   	7    	0    	0    	 |  398   	p     = soc.religion.christian
21   	0    	0    	1    	3    	3    	2    	12   	6    	2    	3    	10   	16   	5    	1    	235  	4    	40   	0    	0    	0    	 |  364   	q     = talk.politics.guns
41   	0    	0    	2    	1    	1    	5    	3    	3    	7    	0    	10   	12   	5    	1    	8    	250  	27   	0    	0    	0    	 |  376   	r     = talk.politics.mideast
34   	0    	0    	1    	2    	4    	3    	16   	2    	1    	5    	14   	12   	6    	4    	67   	8    	131  	0    	0    	0    	 |  310   	s     = talk.politics.misc
50   	0    	0    	1    	2    	0    	1    	15   	7    	0    	3    	11   	21   	7    	53   	17   	6    	19   	38   	0    	0    	 |  251   	t     = talk.religion.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  0     	u     = DEFAULT
Default Category: DEFAULT: 20
{quote}
                
> Rework 20 NewsGroup shell script example to include SGD Example
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-857
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-857
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>         Attachments: MAHOUT-857.patch
>
>
> We have build-20news-bayes.sh that runs our NB stuff on 20 news groups.  We also have an SGD example that works on 20 news groups, but no script to run it.  I'm going to rename build-20news-bayes.sh to classify-20news.sh and incorporate the two.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira