You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2017/01/04 12:24:26 UTC

[GitHub] spark pull request #16471: [SPARK-19078] PCAModel.transform avoid extra vect...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/16471

    [SPARK-19078] PCAModel.transform avoid extra vector conversion

    ## What changes were proposed in this pull request?
    
    As suggested in the source, avoid the vector conversion in `PCAModel.transform()`
    
    ## How was this patch tested?
    existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark pca_no_conversion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16471
    
----
commit 0ea37af50ca962300d629f70e79fad5dd3e15481
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2017-01-04T09:52:51Z

    create pr

commit 6b3f28c907f9ee52dca6a6c6053b18fae9c9e103
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2017-01-04T09:59:34Z

    del one empty line

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    @imatiach-msft Thanks for your suggestion and reviewing. I start this PR because I found that in source same `// TODO: Make the transformer natively in ml framework to avoid extra conversion.` appears in several places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,Sta...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng closed the pull request at:

    https://github.com/apache/spark/pull/16471


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70918/testReport)** for PR 16471 at commit [`ef20dbe`](https://github.com/apache/spark/commit/ef20dbe272139c26701e294995586d2f093f75fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70915/testReport)** for PR 16471 at commit [`3e175a9`](https://github.com/apache/spark/commit/3e175a996bf6d415264470dafd5a4fc9afbffa6a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    I think you might need to add [ML] to the pull request name, eg:
    [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardScaler,PCA transform avoid extra vector conversion
    
    I like the changes but I'm not sure if we want to directly port the code instead of using the ml lib code, since now if there are bugs in ml or mllib a developer would have to make changes in two places instead of one.  I'm a new contributor myself so I'm not sure what is traditionally recommended in cases like this.  It looks like in some cases it should be possible to refactor duplicate code to one place and have both ml and mllib call into the same method?  Maybe that would address my concern.  Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70872/testReport)** for PR 16471 at commit [`6b3f28c`](https://github.com/apache/spark/commit/6b3f28c907f9ee52dca6a6c6053b18fae9c9e103).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70915/testReport)** for PR 16471 at commit [`3e175a9`](https://github.com/apache/spark/commit/3e175a996bf6d415264470dafd5a4fc9afbffa6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70872/testReport)** for PR 16471 at commit [`6b3f28c`](https://github.com/apache/spark/commit/6b3f28c907f9ee52dca6a6c6053b18fae9c9e103).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardSc...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    @sethah Yes. I will close the duplicated PR and JIRA, and help to reivewing that PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardSc...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Is this a duplicate of [SPARK-18385](https://issues.apache.org/jira/browse/SPARK-18385). Can you please see the discussion on the PR: https://github.com/apache/spark/pull/15831/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70918/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70915/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    cc @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    **[Test build #70918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70918/testReport)** for PR 16471 at commit [`ef20dbe`](https://github.com/apache/spark/commit/ef20dbe272139c26701e294995586d2f093f75fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70872/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org