You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/04 15:36:39 UTC

[jira] [Updated] (SPARK-12016) word2vec load model can't use findSynonyms to get words

     [ https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-12016:
------------------------------
    Assignee: Liang-Chi Hsieh

> word2vec load model can't use findSynonyms to get words 
> --------------------------------------------------------
>
>                 Key: SPARK-12016
>                 URL: https://issues.apache.org/jira/browse/SPARK-12016
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>         Environment: ubuntu 14.04
>            Reporter: yuangang.liu
>            Assignee: Liang-Chi Hsieh
>             Fix For: 2.0.0
>
>
> I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.
> I use the fellow code to test word2vec
> from pyspark import SparkContext
> from pyspark.mllib.feature import Word2Vec, Word2VecModel
> import os, tempfile
> from shutil import rmtree
> if __name__ == '__main__':
>     sc = SparkContext('local', 'test')
>     sentence = "a b " * 100 + "a c " * 10
>     localDoc = [sentence, sentence]
>     doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
>     model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
>     syms = model.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     path = tempfile.mkdtemp()
>     model.save(sc, path)
>     sameModel = Word2VecModel.load(sc, path)
>     print model.transform("a") == sameModel.transform("a")
>     syms = sameModel.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     try:
>         rmtree(path)
>     except OSError:
>         pass
> I got "[u'b', u'c']" when the first printf
> then the “True” and " [u'__class__'] "
> I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org