You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by littlebird <cx...@163.com> on 2014/06/07 16:15:21 UTC

How to process multiple classification with SVM in MLlib

Hi All,
  As we know, In MLlib the SVM is used for binary classification. I wonder
how to train SVM model for mutiple classification in MLlib. In addition, how
to apply the machine learning algorithm in Spark if the algorithm isn't
included in MLlib. Thank you.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by littlebird <cx...@163.com>.

Thanks. Now I know how to broadcast the dataset but I still wonder after 
broadcasting the dataset how can I apply my algorithm to training the model
in the wokers. To describe my question in detail, The following code is used
to train LDA(Latent Dirichlet Allocation) model with JGibbLDA in single
machine, it iterate to sample the topic and train the model. After 
broadcasting the dataset, how can I keep the code  running in Spark? Thank
you.
		LDACmdOption ldaOption = new LDACmdOption(); 
		ldaOption.est = true;
		ldaOption.estc = false;
		ldaOption.modelName = "model-final";
		ldaOption.dfile = "/usr/Java";
		ldaOption.alpha = 0.5;
		ldaOption.beta = 0.1;
		ldaOption.K = 10;
		ldaOption.niters = 1000;
		topicNum = ldaOption.K;
		Estimator estimator = new Estimator();
		estimator.init(ldaOption);
		estimator.estimate();





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7288.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by littlebird <cx...@163.com>.

Thanks. Now I know how to broadcast the dataset but I still wonder after 
broadcasting the dataset how can I apply my algorithm to training the model
in the wokers. To describe my question in detail, The following code is used
to train LDA(Latent Dirichlet Allocation) model with JGibbLDA in single
machine, it iterate to sample the topic and train the model. After 
broadcasting the dataset, how can I keep the code  running in Spark? Thank
you. 
                LDACmdOption ldaOption = new LDACmdOption(); //to set the
parameters of LDA 
                ldaOption.est = true; 
                ldaOption.estc = false; 
                ldaOption.modelName = "model-final";//the name of the output
file 
                ldaOption.dir = "/usr/Java"; 
                ldaOption.dfile = "newDoc.dat"//this is the input data file 
                ldaOption.alpha = 0.5; 
                ldaOption.beta = 0.1; 
                ldaOption.K = 10;// the numbers of the topic 
                ldaOption.niters = 1000;//the times of iteration 
                topicNum = ldaOption.K; 
                Estimator estimator = new Estimator(); 
                estimator.init(ldaOption); 
                estimator.estimate(); 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7368.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by littlebird <cx...@163.com>.

Someone suggests me to use Mahout, but I'm not familiar with it. And in that
case, using Mahout will add difficulties to my program. I'd like to run the
algorithm in Spark. I'm a beginner, can you give me some suggestions?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7372.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by Xiangrui Meng <me...@gmail.com>.

For broadcast data, please read
http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
.
For one-vs-all, please read
https://en.wikipedia.org/wiki/Multiclass_classification .

-Xiangrui

On Mon, Jun 9, 2014 at 7:24 AM, littlebird <cx...@163.com> wrote:
> Thank you for your reply, I don't quite understand how to do one-vs-all
> manually for multiclass
> training. And for the second question, My algorithm is implemented in Java
> and designed for single machine, How can I broadcast the dataset to each
> worker, train models on workers? Thank you very much.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7251.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by littlebird <cx...@163.com>.

Thank you for your reply, I don't quite understand how to do one-vs-all
manually for multiclass 
training. And for the second question, My algorithm is implemented in Java
and designed for single machine, How can I broadcast the dataset to each
worker, train models on workers? Thank you very much.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7251.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to process multiple classification with SVM in MLlib

Posted by Xiangrui Meng <me...@gmail.com>.

At this time, you need to do one-vs-all manually for multiclass
training. For your second question, if the algorithm is implemented in
Java/Scala/Python and designed for single machine, you can broadcast
the dataset to each worker, train models on workers. If the algorithm
is implemented in a different language, maybe you need pipe to train
the models outside JVM (similar to Hadoop Streaming). If the algorithm
is designed for a different parallel platform, then it may be hard to
use it in Spark. -Xiangrui

On Sat, Jun 7, 2014 at 7:15 AM, littlebird <cx...@163.com> wrote:
> Hi All,
>   As we know, In MLlib the SVM is used for binary classification. I wonder
> how to train SVM model for mutiple classification in MLlib. In addition, how
> to apply the machine learning algorithm in Spark if the algorithm isn't
> included in MLlib. Thank you.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.