You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jegonzal <gi...@git.apache.org> on 2014/10/29 08:36:26 UTC

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

GitHub user jegonzal opened a pull request:

    https://github.com/apache/spark/pull/2996

    [SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace 

    This simple patch filters out extra whitespace entries.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jegonzal/spark loadLibSVM

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2996
    
----
commit e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980
Author: Joseph E. Gonzalez <jo...@gmail.com>
Date:   2014-10-29T07:13:56Z

    fixing whitespace bug in loadLibSVMFile when parsing libSVM files

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61053360
  
    LGTM. Merged into master. If the performance gain is worth the extra code complexity, we can switch to the new implementation. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-60888681
  
      [Test build #22442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull) for   PR 2996 at commit [`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-60888689
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22442/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-60884323
  
      [Test build #22442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull) for   PR 2996 at commit [`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61032086
  
      [Test build #22494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22494/consoleFull) for   PR 2996 at commit [`e0227ab`](https://github.com/apache/spark/commit/e0227ab0728efefa866864bd09dd0f139c024623).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61032089
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22494/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61026446
  
      [Test build #22494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22494/consoleFull) for   PR 2996 at commit [`e0227ab`](https://github.com/apache/spark/commit/e0227ab0728efefa866864bd09dd0f139c024623).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2996#discussion_r19554794
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
    @@ -76,7 +76,7 @@ object MLUtils {
           .map { line =>
             val items = line.split(' ')
             val label = items.head.toDouble
    -        val (indices, values) = items.tail.map { item =>
    +        val (indices, values) = items.tail.filter( pair => !pair.isEmpty ).map { item =>
    --- End diff --
    
    minor (but since we need to run Jenkins again): `filter(_.nonEmpty)` is more readable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by jegonzal <gi...@git.apache.org>.
Github user jegonzal commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61026000
  
    Not sure why it failed the test.  Is this an issue with the testing framework?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

Posted by jegonzal <gi...@git.apache.org>.
Github user jegonzal commented on the pull request:

    https://github.com/apache/spark/pull/2996#issuecomment-61026298
  
    The following implementation seems a bit more efficient but is needlessly complicated.
    
    ```scala
          // Count the number of empty values                                                                                                                                                                     
            var i = 1
            var emptyValues = 0
            while (i < items.size) {
              if (items(i).isEmpty) emptyValues += 1
              i += 1
            }
            // Determine the number of non-zero entries                                                                                                                                                             
            val nnzs = items.size - 1 - emptyValues
            // Compute the indices                                                                                                                                                                                  
            val indices = new Array[Int](nnzs)
            val values = new Array[Double](nnzs)
            i = 1
            var j = 0
            while (i < items.size) {
              if (!items(i).isEmpty) {
                val indexAndValue = items(i).split(':')
                indices(j) = indexAndValue(0).toInt - 1 // Convert 1-based indices to 0-based.                                                                                                                      
                values(j) = indexAndValue(1).toDouble
                j += 1
              }
              i += 1
            }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org