You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ericl <gi...@git.apache.org> on 2015/12/16 09:48:52 UTC

[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/10323

    [SPARK-12346] [ML] Missing attribute names in GLM for vector-type features

    Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names.
    
    cc @mengxr 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark spark-12346

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10323
    
----
commit dcb5a40e1806244139aa7adde338a3c4c2c5eda4
Author: Eric Liang <ek...@databricks.com>
Date:   2015-12-16T08:34:41Z

    vec attrs

commit 1c66cdd9f197545bb10c6b4b670e7ee5c195fbc1
Author: Eric Liang <ek...@databricks.com>
Date:   2015-12-16T08:47:33Z

    Wed Dec 16 00:47:33 PST 2015

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10323#issuecomment-165063423
  
    **[Test build #47806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47806/consoleFull)** for PR 10323 at commit [`1c66cdd`](https://github.com/apache/spark/commit/1c66cdd9f197545bb10c6b4b670e7ee5c195fbc1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/10323#issuecomment-169425244
  
    @ericl this looks great, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10323#discussion_r47758991
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
    @@ -143,6 +143,44 @@ class RFormulaSuite extends SparkFunSuite with MLlibTestSparkContext {
         assert(attrs === expectedAttrs)
       }
     
    +  test("vector attribute generation") {
    +    val formula = new RFormula().setFormula("id ~ vec")
    +    val original = sqlContext.createDataFrame(
    +      Seq((1, Vectors.dense(0.0, 1.0)), (2, Vectors.dense(1.0, 2.0)))
    --- End diff --
    
    Should we support ```term``` in R formula is of type vector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/10323#issuecomment-172650949
  
    Merged into master and branch-1.6. Thanks! I created https://issues.apache.org/jira/browse/SPARK-12886 to track some follow-up tasks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10323#issuecomment-165038820
  
    **[Test build #47806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47806/consoleFull)** for PR 10323 at commit [`1c66cdd`](https://github.com/apache/spark/commit/1c66cdd9f197545bb10c6b4b670e7ee5c195fbc1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12346] [ML] Missing attribute names in ...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10323#discussion_r47823898
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
    @@ -143,6 +143,44 @@ class RFormulaSuite extends SparkFunSuite with MLlibTestSparkContext {
         assert(attrs === expectedAttrs)
       }
     
    +  test("vector attribute generation") {
    +    val formula = new RFormula().setFormula("id ~ vec")
    +    val original = sqlContext.createDataFrame(
    +      Seq((1, Vectors.dense(0.0, 1.0)), (2, Vectors.dense(1.0, 2.0)))
    --- End diff --
    
    I think it makes sense when using RFormula in a ML pipeline (not necessarily in R).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org