You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by shubhamchopra <gi...@git.apache.org> on 2017/05/26 18:51:05 UTC

[GitHub] spark pull request #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative ...

GitHub user shubhamchopra opened a pull request:

    https://github.com/apache/spark/pull/18123

    [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Sampling

    ## What changes were proposed in this pull request?
    
    This enhances [CBOW + Negative Sampling](https://github.com/apache/spark/pull/17673) to be able to estimate Skip-Gram with negative sampling as well.
    
    ## How was this patch tested?
    
    Patch was tested using unit tests being contributed as a part of the PR
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shubhamchopra/spark Word2VecSGNS

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18123
    
----
commit 2e777406b9dd69a47952a9650ab2a7a323e93391
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-11T21:18:00Z

    Word2Vec CBOW + Negative Sampling implementation.

commit 5d76210a48004b8f6d8bb32e55ba2d67892d72c7
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-11T23:24:31Z

    Correcting the negative samples function.

commit 5725209ab48b0d0f2c58559d7b5da0a2dbbdb5d6
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-12T16:41:33Z

    Correcting scala style issue.

commit 3fa2be111125a7739e0ed798c8c36df1ce826872
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-13T19:01:55Z

    Checking to make sure neg samples is less than the vocab size.

commit 8af4980af7522d7964cd265be8b021a8b39bad59
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-13T19:04:59Z

    removing unused function.

commit a8eb7ed50d357156107417c20b073b8ba5aefa81
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-04-13T19:06:15Z

    Adding test cases, similar to the ones for skip-gram based estimation.

commit 69b297b6cb0d57c2b3dd57db2a72da57542f0579
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-02T18:42:56Z

    incorporating code feedback from @Krimit

commit b3683eeee451ffa4190ce24a6be2151212c8a644
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-03T20:35:32Z

    Using HasSolver trait.

commit 27e6c88956517b2db99e6a76b9f05036311f57ba
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-04T01:27:23Z

    Adding sub sampling and incorporating feedback.

commit 96ac2b1e3a2be52c729ef8cd48563520eeb35a93
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-04T18:50:42Z

    Putting the CBOW solver in a separate object.

commit cad67434030d8db4404091e899d382640ae77075
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-05T14:36:15Z

    Removing unused variables.

commit fda2232beb5e173004d1189324274c33521e8945
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-05T17:42:42Z

    Converting log statement to debug

commit e3c62f64c46a846c92127375a7bce99e6a7b15ad
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-23T19:44:30Z

    correcting indentation.

commit bcd984f646b39d6804e6faea89837f847f2754ca
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-23T20:08:27Z

    when using sampling, the sampled sentence should be used downstream.

commit acec6fb5c5c87239c322c8185557696c966aa824
Author: Shubham Chopra <sc...@bloomberg.net>
Date:   2017-05-23T20:52:14Z

    Implementing SkipGram+NegativeSampling, reusing most of CBOW+NegativeSampling implementation.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #80242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80242/testReport)** for PR 18123 at commit [`7955181`](https://github.com/apache/spark/commit/79551816d74ef6cdb40e8450e29b742107613313).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81319/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #80242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80242/testReport)** for PR 18123 at commit [`7955181`](https://github.com/apache/spark/commit/79551816d74ef6cdb40e8450e29b742107613313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #82006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82006/testReport)** for PR 18123 at commit [`b218c4b`](https://github.com/apache/spark/commit/b218c4be6d8540d83a62a7a869ecfa3a4c06ad27).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #81319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81319/testReport)** for PR 18123 at commit [`8775b25`](https://github.com/apache/spark/commit/8775b258af7ec9690f2b12edc6fcc8ad7374870c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81232/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #81232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81232/testReport)** for PR 18123 at commit [`855b692`](https://github.com/apache/spark/commit/855b692f71434ea229248eeb0e4f49c4c898b22d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80242/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #81319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81319/testReport)** for PR 18123 at commit [`8775b25`](https://github.com/apache/spark/commit/8775b258af7ec9690f2b12edc6fcc8ad7374870c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #82006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82006/testReport)** for PR 18123 at commit [`b218c4b`](https://github.com/apache/spark/commit/b218c4be6d8540d83a62a7a869ecfa3a4c06ad27).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82006/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    **[Test build #81232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81232/testReport)** for PR 18123 at commit [`855b692`](https://github.com/apache/spark/commit/855b692f71434ea229248eeb0e4f49c4c898b22d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18123: [SPARK-20903] [ML] Word2Vec Skip-Gram + Negative Samplin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18123
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org