You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ajtulloch <gi...@git.apache.org> on 2014/05/11 01:31:44 UTC
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
GitHub user ajtulloch opened a pull request:
https://github.com/apache/spark/pull/725
SPARK-1791 - SVM implementation does not use threshold parameter
Summary:
https://issues.apache.org/jira/browse/SPARK-1791
Simple fix, and backward compatible, since
- anyone who set the threshold was getting completely wrong answers.
- anyone who did not set the threshold had the default 0.0 value for the threshold anyway.
Test Plan:
Unit test added that is verified to fail under the old implementation,
and pass under the new implementation.
Reviewers:
CC:
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ajtulloch/spark SPARK-1791-SVM
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/725.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #725
----
commit 6f7075a75927b08c3c1211642632e03c2246f5cd
Author: Andrew Tulloch <an...@tullo.ch>
Date: 2014-05-10T23:22:02Z
SPARK-1791 - SVM implementation does not use threshold parameter
Summary:
https://issues.apache.org/jira/browse/SPARK-1791
Simple fix, and backward compatible, since
- anyone who set the threshold was getting completely wrong answers.
- anyone who did not set the threshold had the default 0.0 value for the threshold anyway.
Test Plan:
Unit test added that is verified to fail under the old implementation,
and pass under the new implementation.
Reviewers:
CC:
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-42758415
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14879/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/725#discussion_r12594467
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala ---
@@ -69,6 +69,43 @@ class SVMSuite extends FunSuite with LocalSparkContext {
assert(numOffPredictions < input.length / 5)
}
+ test("SVM with threshold") {
+ val nPoints = 10000
+
+ // NOTE: Intercept should be small for generating equal 0s and 1s
+ val A = 0.01
+ val B = -1.5
+ val C = 1.0
+
+ val testData = SVMSuite.generateSVMInput(A, Array[Double](B,C), nPoints, 42)
+
+ val testRDD = sc.parallelize(testData, 2)
+ testRDD.cache()
+
+ val svm = new SVMWithSGD().setIntercept(true)
+ svm.optimizer.setStepSize(1.0).setRegParam(1.0).setNumIterations(100)
+
+ val model = svm.run(testRDD)
+
+ val validationData = SVMSuite.generateSVMInput(A, Array[Double](B,C), nPoints, 17)
--- End diff --
Ditto.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by ajtulloch <gi...@git.apache.org>.
Github user ajtulloch commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43020578
@mengxr - thanks for the comments, I've updated with the fixes. Please also have a look at https://github.com/apache/spark/pull/726 which cleans up formatting, code duplication, etc in the `SVMSuite.scala` file.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43029884
LGTM. Thanks! I will take a look at #726 today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43020506
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/725#discussion_r12594596
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala ---
@@ -69,6 +69,43 @@ class SVMSuite extends FunSuite with LocalSparkContext {
assert(numOffPredictions < input.length / 5)
}
+ test("SVM with threshold") {
+ val nPoints = 10000
+
+ // NOTE: Intercept should be small for generating equal 0s and 1s
+ val A = 0.01
+ val B = -1.5
+ val C = 1.0
+
+ val testData = SVMSuite.generateSVMInput(A, Array[Double](B,C), nPoints, 42)
+
+ val testRDD = sc.parallelize(testData, 2)
+ testRDD.cache()
+
+ val svm = new SVMWithSGD().setIntercept(true)
+ svm.optimizer.setStepSize(1.0).setRegParam(1.0).setNumIterations(100)
+
+ val model = svm.run(testRDD)
+
+ val validationData = SVMSuite.generateSVMInput(A, Array[Double](B,C), nPoints, 17)
+ val validationRDD = sc.parallelize(validationData, 2)
+
+ // Test prediction on RDD.
+
+ var predictions = model.predict(validationRDD.map(_.features)).collect()
+ assert(predictions.count {_ == 0.0 } != predictions.length)
--- End diff --
Change to `count(_ == 0.0)`, which is more common in Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-42981423
ah, I made this ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-42757678
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43020526
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-42757676
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43029934
Thanks. Merged this into master & branch-1.0.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/725#discussion_r12594375
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala ---
@@ -69,6 +69,43 @@ class SVMSuite extends FunSuite with LocalSparkContext {
assert(numOffPredictions < input.length / 5)
}
+ test("SVM with threshold") {
+ val nPoints = 10000
+
+ // NOTE: Intercept should be small for generating equal 0s and 1s
+ val A = 0.01
+ val B = -1.5
+ val C = 1.0
+
+ val testData = SVMSuite.generateSVMInput(A, Array[Double](B,C), nPoints, 42)
--- End diff --
Please put a `,` after `B`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43023787
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14949/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-42758411
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/725
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1791 - SVM implementation does not use t...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/725#issuecomment-43023786
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---