You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:12:32 UTC

[jira] [Resolved] (SPARK-20902) Word2Vec implementations with Negative Sampling

     [ https://issues.apache.org/jira/browse/SPARK-20902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-20902.
----------------------------------
    Resolution: Incomplete

> Word2Vec implementations with Negative Sampling
> -----------------------------------------------
>
>                 Key: SPARK-20902
>                 URL: https://issues.apache.org/jira/browse/SPARK-20902
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.1.1
>            Reporter: Shubham Chopra
>            Priority: Major
>              Labels: ML, bulk-closed
>
> Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical softmax. Both Continuous bag of words (CBOW) and SkipGram have shown comparative or better performance with Negative Sampling. This umbrella JIRA is to keep a track of the effort to add negative sampling based implementations of both CBOW and SkipGram models to Spark MLlib.
> Since word2vec is largely a pre-processing step, the performance often can depend on the application it is being used for, and the corpus it is estimated on. These implementation give users the choice of picking one that works best for their use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org