You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jegonzal <gi...@git.apache.org> on 2014/10/29 08:36:26 UTC
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/2996
[SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace
This simple patch filters out extra whitespace entries.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jegonzal/spark loadLibSVM
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2996.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2996
----
commit e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980
Author: Joseph E. Gonzalez <jo...@gmail.com>
Date: 2014-10-29T07:13:56Z
fixing whitespace bug in loadLibSVMFile when parsing libSVM files
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61053360
LGTM. Merged into master. If the performance gain is worth the extra code complexity, we can switch to the new implementation. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-60888681
[Test build #22442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull) for PR 2996 at commit [`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-60888689
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22442/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-60884323
[Test build #22442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull) for PR 2996 at commit [`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61032086
[Test build #22494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22494/consoleFull) for PR 2996 at commit [`e0227ab`](https://github.com/apache/spark/commit/e0227ab0728efefa866864bd09dd0f139c024623).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61032089
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22494/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61026446
[Test build #22494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22494/consoleFull) for PR 2996 at commit [`e0227ab`](https://github.com/apache/spark/commit/e0227ab0728efefa866864bd09dd0f139c024623).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2996
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2996#discussion_r19554794
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -76,7 +76,7 @@ object MLUtils {
.map { line =>
val items = line.split(' ')
val label = items.head.toDouble
- val (indices, values) = items.tail.map { item =>
+ val (indices, values) = items.tail.filter( pair => !pair.isEmpty ).map { item =>
--- End diff --
minor (but since we need to run Jenkins again): `filter(_.nonEmpty)` is more readable
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by jegonzal <gi...@git.apache.org>.
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61026000
Not sure why it failed the test. Is this an issue with the testing framework?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...
Posted by jegonzal <gi...@git.apache.org>.
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61026298
The following implementation seems a bit more efficient but is needlessly complicated.
```scala
// Count the number of empty values
var i = 1
var emptyValues = 0
while (i < items.size) {
if (items(i).isEmpty) emptyValues += 1
i += 1
}
// Determine the number of non-zero entries
val nnzs = items.size - 1 - emptyValues
// Compute the indices
val indices = new Array[Int](nnzs)
val values = new Array[Double](nnzs)
i = 1
var j = 0
while (i < items.size) {
if (!items(i).isEmpty) {
val indexAndValue = items(i).split(':')
indices(j) = indexAndValue(0).toInt - 1 // Convert 1-based indices to 0-based.
values(j) = indexAndValue(1).toDouble
j += 1
}
i += 1
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org