You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2010/01/02 03:31:54 UTC
[jira] Commented: (MAHOUT-232) Implementation of sequential SVM
solver based on Pegasos
[ https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795813#action_12795813 ]
Ted Dunning commented on MAHOUT-232:
------------------------------------
The 0.1 patch compiles for me, but the 0.2 patch produces this problem:
{noformat}
/Users/tdunning/Apache/mahout-trunk/core/src/main/java/org/apache/mahout/classifier/svm/DataSetHandler.java:[195,8] cannot find symbol
symbol : variable HDFSConfig
location: class org.apache.mahout.classifier.svm.DataSetHandler
/Users/tdunning/Apache/mahout-trunk/core/src/main/java/org/apache/mahout/classifier/svm/DataSetHandler.java:[244,8] cannot find symbol
symbol : variable HDFSConfig
location: class org.apache.mahout.classifier.svm.DataSetHandler
{noformat}
It seems that something has been dropped from the patch.
> Implementation of sequential SVM solver based on Pegasos
> --------------------------------------------------------
>
> Key: MAHOUT-232
> URL: https://issues.apache.org/jira/browse/MAHOUT-232
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.2
> Reporter: zhao zhendong
> Attachments: SequentialSVM_0.1.patch, SequentialSVM_0.2.patch
>
>
> After discussed with guys in this community, I decided to re-implement a Sequential SVM solver based on Pegasos for Mahout platform (mahout command line style, SparseMatrix and SparseVector etc.) , Eventually, it will support HDFS.
> Sequential SVM based on Pegasos.
> Maxim zhao (zhaozhendong at gmail dot com)
> -------------------------------------------------------------------------------------------
> Currently, this package provides (Features):
> -------------------------------------------------------------------------------------------
> 1. Sequential SVM linear solver, include training and testing.
> 2. Supporting general file system and HDFS right now.
> 3. Supporting large-scale data set.
> Because of the Pegasos only need to sample certain amount of samples, this package pre-fetches certain size (e.g. max iteration) of samples to memory.
> For example: if the size of data set has 100,000,000 samples, due to the default maximum iteration is 10,000, thus it randomly load 10,000 samples to memory.
> 4. Sequential Data set testing, then the package can support large-scale data set both on training and testing process.
> -------------------------------------------------------------------------------------------
> TODO:
> -------------------------------------------------------------------------------------------
> 1. HDFS writ function for storing model file to HDFS.
> 2. Parallel testing algorithm based MapReduce framework.
> 3. Regression.
> 4. Multi-classification.
> -------------------------------------------------------------------------------------------
> Usage:
> -------------------------------------------------------------------------------------------
> Training:
> SVMPegasosTraining.java
> I have hard coded the arguments in this file, if you want to custom the arguments by youself, please uncomment the first line in main function.
> The default argument is:
> -tr ../examples/src/test/resources/svmdataset/train.dat -m ../examples/src/test/resources/svmdataset/SVM.model
> [For the case that training data set on HDFS:]
> >>>>>>>
> 1 Assure that your training data set has been submitted to hdfs
> hadoop-work-space# bin/hadoop fs -ls path-of-train-dataset
> 2 revise the argument:
> -tr /user/hadoop/train.dat -m ../examples/src/test/resources/svmdataset/SVM.model -hdfs hdfs://localhost:12009
> >>>>>>>
> Testing:
> SVMPegasosTesting.java
> I have hard coded the arguments in this file, if you want to custom the arguments by youself, please uncomment the first line in main function.
> The default argument is:
> -te ../examples/src/test/resources/svmdataset/test.dat -m ../examples/src/test/resources/svmdataset/SVM.model
> -------------------------------------------------------------------------------------------
> Experimental Results:
> -------------------------------------------------------------------------------------------
> Data set:
> name | source | type| class | training size | testing size | feature
> -----------------------------------------------------------------------------------------------
> rcv1.binary | [DL04b] | classification | 2 | 20,242 | 677,399 | 47,236
> covtype.binary | UCI | classification | 2 | 581,012 | 54
> a9a | UCI | classification | 2 | 32,561 | 16,281 | 123
> w8a | [JP98a] | classification | 2 | 49,749 | 14,951 | 300
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Data set | Accuracy | Training Time | Testing Time |
> rcv1.binary | 94.67% | 19 Sec | 2 min 25 Sec |
> covtype.binary | | 19 Sec | |
> a9a | 84.72% | 14 Sec | 12 Sec |
> w8a | 89.8 % | 14 Sec | 8 Sec |
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.