You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BenFradet <gi...@git.apache.org> on 2016/04/22 18:34:09 UTC

[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

GitHub user BenFradet opened a pull request:

    https://github.com/apache/spark/pull/12614

    [SPARK-14730][ML] Expose ColumnPruner as feature transformer

    ## What changes were proposed in this pull request?
    
    The column pruner transformer present in RFormula was made public
    
    ## How was this patch tested?
    
    Unit tests were written
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BenFradet/spark SPARK-14730

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12614
    
----
commit 66658d29d3aaee9033be21e0bdadae72110ac9eb
Author: BenFradet <be...@gmail.com>
Date:   2016-04-21T13:34:25Z

    first version of the column pruner

commit 83866cfa7e3f7535855a74b3998bc996f6e7c649
Author: BenFradet <be...@gmail.com>
Date:   2016-04-21T13:34:41Z

    removed ColumnPruner from RFormula

commit dd425d20417895e84acd3a94f9ecf1f8af4ba5bb
Author: BenFradet <be...@gmail.com>
Date:   2016-04-22T15:34:23Z

    checkCanTransform method

commit 0f311ec29cbe42a3c64c122ed28be6b226b30026
Author: BenFradet <be...@gmail.com>
Date:   2016-04-22T16:32:27Z

    test suite for the column pruner transformer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-214515143
  
    **[Test build #56917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56917/consoleFull)** for PR 12614 at commit [`318863b`](https://github.com/apache/spark/commit/318863b7f4b9211d7d4d055573bca7354148ee9b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    @yanboliang should I close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-214503773
  
    **[Test build #56917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56917/consoleFull)** for PR 12614 at commit [`318863b`](https://github.com/apache/spark/commit/318863b7f4b9211d7d4d055573bca7354148ee9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-213501606
  
    **[Test build #56706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56706/consoleFull)** for PR 12614 at commit [`0f311ec`](https://github.com/apache/spark/commit/0f311ec29cbe42a3c64c122ed28be6b226b30026).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    @manugarri I don't think there was much interest in this PR.
    
    Would you care to weigh in @jkbradley @yanboliang?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-214515386
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56917/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-213547473
  
    **[Test build #56706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56706/consoleFull)** for PR 12614 at commit [`0f311ec`](https://github.com/apache/spark/commit/0f311ec29cbe42a3c64c122ed28be6b226b30026).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Posted by manugarri <gi...@git.apache.org>.
Github user manugarri commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    @davireis that is a similar use case than mine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-213548019
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56706/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-213548016
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Posted by davireis <gi...@git.apache.org>.
Github user davireis commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    Just weighting in the motivations: 
    
    https://0xdata.atlassian.net/browse/SW-224
    http://apache-spark-developers-list.1001551.n3.nabble.com/spark-ml-Why-is-private-class-ColumnPruner-td16863.html
    
    And my own use case: I have a dataframe with two textual columns on which I want to run apply a LDAModel. This model was trained in a different dataset, and although I can reset its input (setFeatureCol), I cannot reset its output (no setTopicDistributionCol in the trained model). Since both applications of LDAModel will output in the same column name, my pipeline barfs. If I had ColumnPruner, I could just combine it with SQLTransformer to rename the output column. Alternatively LDAModel itself could be fixed, or I could build a WithColumnRenamedTransformer. But ColumnPruner would suffice as primitive for many use cases I believe, since most of the other simple schema manipulations can be achieved with SQLTransformer. Maybe I am missing some already in-place alternatives, but from what I understand, I can only achieve what I want now with a custom transformer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-213611990
  
    @yanboliang 
    Could you please take a look at this when you have the chance? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14730][ML] Expose ColumnPruner as featu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12614#issuecomment-214515383
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Posted by manugarri <gi...@git.apache.org>.
Github user manugarri commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    Any word on when is this going to be available in PySpark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12614: [SPARK-14730][ML] Expose ColumnPruner as feature ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet closed the pull request at:

    https://github.com/apache/spark/pull/12614


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org