You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by aditya1702 <ad...@gmail.com> on 2016/10/24 20:43:52 UTC

Need help with SVM

Hello,
I am using linear SVM to train my model and generate a line through my data.
However my model always predicts 1 for all the feature examples. Here is my
code:

print data_rdd.take(5)
[LabeledPoint(1.0, [1.9643,4.5957]), LabeledPoint(1.0, [2.2753,3.8589]),
LabeledPoint(1.0, [2.9781,4.5651]), LabeledPoint(1.0, [2.932,3.5519]),
LabeledPoint(1.0, [3.5772,2.856])]

----------------------------------------------------------------------------------------
from pyspark.mllib.classification import SVMWithSGD
from pyspark.mllib.linalg import Vectors
from sklearn.svm import SVC
data_rdd=x_df.map(lambda x:LabeledPoint(x[1],x[0]))

model = SVMWithSGD.train(data_rdd, iterations=1000,regParam=1)

X=x_df.map(lambda x:x[0]).collect()
Y=x_df.map(lambda x:x[1]).collect()

----------------------------------------------------------------------------------------
pred=[]
for i in X:
  pred.append(model.predict(i))
print pred

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1]


My dataset is as follows:
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png> 


Can someone please help?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Need help with SVM

Posted by Robin East <ro...@xense.co.uk>.

As per Assem’s point what do you get from data_rdd.toDF.groupBy("label").count.show




> On 25 Oct 2016, at 15:41, Aseem Bansal <as...@gmail.com> wrote:
> 
> Is there any labeled point with label 0 in your dataset? 
> 
> On Tue, Oct 25, 2016 at 2:13 AM, aditya1702 <adityavyas17@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> I am using linear SVM to train my model and generate a line through my data.
> However my model always predicts 1 for all the feature examples. Here is my
> code:
> 
> print data_rdd.take(5)
> [LabeledPoint(1.0, [1.9643,4.5957]), LabeledPoint(1.0, [2.2753,3.8589]),
> LabeledPoint(1.0, [2.9781,4.5651]), LabeledPoint(1.0, [2.932,3.5519]),
> LabeledPoint(1.0, [3.5772,2.856])]
> 
> ----------------------------------------------------------------------------------------
> from pyspark.mllib.classification import SVMWithSGD
> from pyspark.mllib.linalg import Vectors
> from sklearn.svm import SVC
> data_rdd=x_df.map(lambda x:LabeledPoint(x[1],x[0]))
> 
> model = SVMWithSGD.train(data_rdd, iterations=1000,regParam=1)
> 
> X=x_df.map(lambda x:x[0]).collect()
> Y=x_df.map(lambda x:x[1]).collect()
> 
> ----------------------------------------------------------------------------------------
> pred=[]
> for i in X:
>   pred.append(model.predict(i))
> print pred
> 
> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1]
> 
> 
> My dataset is as follows:
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png>>
> 
> 
> Can someone please help?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html <http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
>

Re: Need help with SVM

Posted by Robin East <ro...@xense.co.uk>.

It looks like the training is over-regularised - dropping the regParam to 0.1 or 0.01 should resolve the problem.

-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>





> On 26 Oct 2016, at 11:05, Aseem Bansal <as...@gmail.com> wrote:
> 
> He replied to me. Forwarding to the mailing list. 
> 
> ---------- Forwarded message ----------
> From: Aditya Vyas <adityavyas17@gmail.com <ma...@gmail.com>>
> Date: Tue, Oct 25, 2016 at 8:16 PM
> Subject: Re: Need help with SVM
> To: Aseem Bansal <asmbansal2@gmail.com <ma...@gmail.com>>
> 
> 
> Hello,
> Here is the public gist:https://gist.github.com/aditya1702/760cd5c95a6adf2447347e0b087bc318 <https://gist.github.com/aditya1702/760cd5c95a6adf2447347e0b087bc318>
> 
> Do tell if you need more information
> 
> Regards,
> Aditya
> 
> On Tue, Oct 25, 2016 at 8:11 PM, Aseem Bansal <asmbansal2@gmail.com <ma...@gmail.com>> wrote:
> Is there any labeled point with label 0 in your dataset? 
> 
> On Tue, Oct 25, 2016 at 2:13 AM, aditya1702 <adityavyas17@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> I am using linear SVM to train my model and generate a line through my data.
> However my model always predicts 1 for all the feature examples. Here is my
> code:
> 
> print data_rdd.take(5)
> [LabeledPoint(1.0, [1.9643,4.5957]), LabeledPoint(1.0, [2.2753,3.8589]),
> LabeledPoint(1.0, [2.9781,4.5651]), LabeledPoint(1.0, [2.932,3.5519]),
> LabeledPoint(1.0, [3.5772,2.856])]
> 
> ----------------------------------------------------------------------------------------
> from pyspark.mllib.classification import SVMWithSGD
> from pyspark.mllib.linalg import Vectors
> from sklearn.svm import SVC
> data_rdd=x_df.map(lambda x:LabeledPoint(x[1],x[0]))
> 
> model = SVMWithSGD.train(data_rdd, iterations=1000,regParam=1)
> 
> X=x_df.map(lambda x:x[0]).collect()
> Y=x_df.map(lambda x:x[1]).collect()
> 
> ----------------------------------------------------------------------------------------
> pred=[]
> for i in X:
>   pred.append(model.predict(i))
> print pred
> 
> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1]
> 
> 
> My dataset is as follows:
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png>>
> 
> 
> Can someone please help?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html <http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
> 
> 
>

Fwd: Need help with SVM

Posted by Aseem Bansal <as...@gmail.com>.

He replied to me. Forwarding to the mailing list.

---------- Forwarded message ----------
From: Aditya Vyas <ad...@gmail.com>
Date: Tue, Oct 25, 2016 at 8:16 PM
Subject: Re: Need help with SVM
To: Aseem Bansal <as...@gmail.com>


Hello,
Here is the public gist:https://gist.github.com/a
ditya1702/760cd5c95a6adf2447347e0b087bc318

Do tell if you need more information

Regards,
Aditya

On Tue, Oct 25, 2016 at 8:11 PM, Aseem Bansal <as...@gmail.com> wrote:

> Is there any labeled point with label 0 in your dataset?
>
> On Tue, Oct 25, 2016 at 2:13 AM, aditya1702 <ad...@gmail.com>
> wrote:
>
>> Hello,
>> I am using linear SVM to train my model and generate a line through my
>> data.
>> However my model always predicts 1 for all the feature examples. Here is
>> my
>> code:
>>
>> print data_rdd.take(5)
>> [LabeledPoint(1.0, [1.9643,4.5957]), LabeledPoint(1.0, [2.2753,3.8589]),
>> LabeledPoint(1.0, [2.9781,4.5651]), LabeledPoint(1.0, [2.932,3.5519]),
>> LabeledPoint(1.0, [3.5772,2.856])]
>>
>> ------------------------------------------------------------
>> ----------------------------
>> from pyspark.mllib.classification import SVMWithSGD
>> from pyspark.mllib.linalg import Vectors
>> from sklearn.svm import SVC
>> data_rdd=x_df.map(lambda x:LabeledPoint(x[1],x[0]))
>>
>> model = SVMWithSGD.train(data_rdd, iterations=1000,regParam=1)
>>
>> X=x_df.map(lambda x:x[0]).collect()
>> Y=x_df.map(lambda x:x[1]).collect()
>>
>> ------------------------------------------------------------
>> ----------------------------
>> pred=[]
>> for i in X:
>>   pred.append(model.predict(i))
>> print pred
>>
>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1]
>>
>>
>> My dataset is as follows:
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2
>> 7955/Screen_Shot_2016-10-25_at_2.png>
>>
>>
>> Can someone please help?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Re: Need help with SVM

Posted by Aseem Bansal <as...@gmail.com>.

Is there any labeled point with label 0 in your dataset?

On Tue, Oct 25, 2016 at 2:13 AM, aditya1702 <ad...@gmail.com> wrote:

> Hello,
> I am using linear SVM to train my model and generate a line through my
> data.
> However my model always predicts 1 for all the feature examples. Here is my
> code:
>
> print data_rdd.take(5)
> [LabeledPoint(1.0, [1.9643,4.5957]), LabeledPoint(1.0, [2.2753,3.8589]),
> LabeledPoint(1.0, [2.9781,4.5651]), LabeledPoint(1.0, [2.932,3.5519]),
> LabeledPoint(1.0, [3.5772,2.856])]
>
> ------------------------------------------------------------
> ----------------------------
> from pyspark.mllib.classification import SVMWithSGD
> from pyspark.mllib.linalg import Vectors
> from sklearn.svm import SVC
> data_rdd=x_df.map(lambda x:LabeledPoint(x[1],x[0]))
>
> model = SVMWithSGD.train(data_rdd, iterations=1000,regParam=1)
>
> X=x_df.map(lambda x:x[0]).collect()
> Y=x_df.map(lambda x:x[1]).collect()
>
> ------------------------------------------------------------
> ----------------------------
> pred=[]
> for i in X:
>   pred.append(model.predict(i))
> print pred
>
> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1]
>
>
> My dataset is as follows:
> <http://apache-spark-user-list.1001560.n3.nabble.com/
> file/n27955/Screen_Shot_2016-10-25_at_2.png>
>
>
> Can someone please help?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>