You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/04 15:36:39 UTC
[jira] [Updated] (SPARK-12016) word2vec load model can't use
findSynonyms to get words
[ https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-12016:
------------------------------
Assignee: Liang-Chi Hsieh
> word2vec load model can't use findSynonyms to get words
> --------------------------------------------------------
>
> Key: SPARK-12016
> URL: https://issues.apache.org/jira/browse/SPARK-12016
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.5.2
> Environment: ubuntu 14.04
> Reporter: yuangang.liu
> Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.
> I use the fellow code to test word2vec
> from pyspark import SparkContext
> from pyspark.mllib.feature import Word2Vec, Word2VecModel
> import os, tempfile
> from shutil import rmtree
> if __name__ == '__main__':
> sc = SparkContext('local', 'test')
> sentence = "a b " * 100 + "a c " * 10
> localDoc = [sentence, sentence]
> doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
> model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
> syms = model.findSynonyms("a", 2)
> print [s[0] for s in syms]
> path = tempfile.mkdtemp()
> model.save(sc, path)
> sameModel = Word2VecModel.load(sc, path)
> print model.transform("a") == sameModel.transform("a")
> syms = sameModel.findSynonyms("a", 2)
> print [s[0] for s in syms]
> try:
> rmtree(path)
> except OSError:
> pass
> I got "[u'b', u'c']" when the first printf
> then the “True” and " [u'__class__'] "
> I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org