You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jasoncl <gi...@git.apache.org> on 2016/04/15 23:39:40 UTC

[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

GitHub user jasoncl opened a pull request:

    https://github.com/apache/spark/pull/12428

    [Spark-14564] [ML] [MLlib] [PySpark] Python Word2Vec missing setWindowSize method

    ## What changes were proposed in this pull request?
    Added windowSize getter/setter to ML/MLlib
    
    
    ## How was this patch tested?
    Added test cases in tests.py under both ML and MLlib
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jasoncl/spark SPARK-14564

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12428
    
----
commit 1d9ac29f7c5daeabb0abf4c605dc0671f389e91c
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-12T19:58:48Z

    added windowSize setter and getter for python word2Vec

commit 1e373aae7d7931ed24d61b9f3a0268f46adf76c2
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-14T18:01:26Z

    added windowSize test case

commit 859dd1b2bb4d6a9c4339c7fd628eab4dc53ea980
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-14T18:02:56Z

    added windowSize test case

commit 2f757c837377a0b24ad689ee139d20a6bfe92451
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-14T18:35:16Z

    added assertEqual for windowSize test case

commit 2614fe89a6cf110f33a3cb3862be36e15742873d
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-14T20:20:29Z

    set correct version

commit 83bbdc10e073988b0664d29ff0f73fe7d2629e4b
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-14T23:07:44Z

    added a test case in ml

commit 417a7d48e3167bed319b8f2472971b61b4024590
Author: Jason Lee <cj...@us.ibm.com>
Date:   2016-04-15T21:33:54Z

    rephrase doc for windowSize

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-210658472
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12428#discussion_r60011616
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -339,6 +339,12 @@ def test_param_property_error(self):
             params = param_store.params  # should not invoke the property 'test_property'
             self.assertEqual(len(params), 1)
     
    +    def test_word2vec_param(self):
    +        model = Word2Vec() \
    +            .setWindowSize(6)
    --- End diff --
    
    minor, but this can be one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211547735
  
    LGTM
    Thanks for the PR & others for reviewing!
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211234146
  
    One minor comment. Pending that, the tests and @holdenk's comments, LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211502062
  
    LGTM pending tests :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12428#discussion_r59999797
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2173,28 +2173,31 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has
         minCount = Param(Params._dummy(), "minCount",
                          "the minimum number of times a token must appear to be included in the " +
                          "word2vec model's vocabulary", typeConverter=TypeConverters.toInt)
    +    windowSize = Param(Params._dummy(), "windowSize",
    +                       "the window size (context words from [-window, window])",
    --- End diff --
    
    Should mention the default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211156441
  
    Thanks for taking the initiative to do this. A few minor comments from the first pass through, but in the meantime maybe one the admins (possibly @jkbradley) could either say ok to jenkins to test or add to the whitelist?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211515461
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56080/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211515145
  
    **[Test build #56080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56080/consoleFull)** for PR 12428 at commit [`5cdcf22`](https://github.com/apache/spark/commit/5cdcf22cf0090e3f58b5f1b19c18c350cd6e2d5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211487611
  
    **[Test build #56080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56080/consoleFull)** for PR 12428 at commit [`5cdcf22`](https://github.com/apache/spark/commit/5cdcf22cf0090e3f58b5f1b19c18c350cd6e2d5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211515458
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12428#issuecomment-211227203
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-14564] [ML] [MLlib] [PySpark] Python Wo...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12428#discussion_r59999886
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2245,6 +2248,21 @@ def getMinCount(self):
             """
             return self.getOrDefault(self.minCount)
     
    +    @since("2.0.0")
    +    def setWindowSize(self, value):
    +        """
    +        Sets the value of :py:attr:`windowSize`.
    +        """
    +        self._paramMap[self.windowSize] = value
    --- End diff --
    
    Rather than directly accessing the param map, use the `_set` function (see https://github.com/apache/spark/pull/11939 / SPARK-14104)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org