You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/01/07 02:03:39 UTC

[jira] [Commented] (SPARK-12680) Loading Word2Vec model in pyspark gives "ValueError: too many values to unpack" in findSynonyms

    [ https://issues.apache.org/jira/browse/SPARK-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086602#comment-15086602 ] 

Joseph K. Bradley commented on SPARK-12680:
-------------------------------------------

I've having trouble reproducing this on Spark 1.6.  Are you able to try it with 1.6?

(I'm testing on 1.5 now.)

> Loading Word2Vec model in pyspark gives "ValueError: too many values to unpack" in  findSynonyms
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12680
>                 URL: https://issues.apache.org/jira/browse/SPARK-12680
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 1.5.2
>            Reporter: Sloane Simmons
>
> I can train a model with Word2Vec and then persist it with Word2VecModel#save.  If I load the saved model in pyspark (using python 2.7.10), I get the following error (model.transform included to show that other methods work).
> {code}
> In [3]: from pyspark.mllib.feature import Word2VecModel
> In [4]: model = Word2VecModel.load(sc,"word_vec_from_cleaned_query.model")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> 16/01/06 12:36:11 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
> 16/01/06 12:36:11 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
> In [5]: model.findSynonyms('white',10)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-5-6105c004edd9> in <module>()
> ----> 1 model.findSynonyms('white',10)
> /usr/local/Cellar/apache-spark/1.5.2/libexec/python/pyspark/mllib/feature.pyc in findSynonyms(self, word, num)
>     448         if not isinstance(word, basestring):
>     449             word = _convert_to_vector(word)
> --> 450         words, similarity = self.call("findSynonyms", word, num)
>     451         return zip(words, similarity)
>     452 
> ValueError: too many values to unpack
> In [6]: model.transform('white')
> Out[6]: DenseVector([-0.0213, 0.2292, -0.2012, 0.107, -0.1475, 0.0578, 0.0731, -0.098, -0.1528, 0.1077, 0.0158, -0.0155, -0.1487, 0.0343, 0.2244, 0.0447, 0.2362, -0.1767, 0.064, -0.0148, -0.1291, -0.0171, -0.0642, -0.0754, 0.0417, 0.1547, 0.2745, -0.1178, -0.2895, -0.1314, 0.1023, -0.11, 0.0142, 0.0156, 0.1102, 0.0785, -0.0981, 0.0504, -0.0627, -0.0773, 0.0023, 0.1826, 0.1759, -0.1581, 0.3913, 0.0829, 0.0728, 0.1478, -0.0123, -0.1745, 0.2762, 0.0312, 0.138, 0.0786, -0.0546, 0.5123, 0.237, -0.0241, 0.1594, -0.0645, -0.0425, 0.1265, 0.0305, -0.3164, 0.0601, 0.0565, 0.0066, -0.0818, -0.384, -0.1513, 0.0775, -0.2278, -0.1478, -0.0659, -0.0778, 0.3194, -0.1931, -0.2991, 0.1629, 0.1018, -0.0603, 0.1091, -0.0334, -0.0513, 0.1067, 0.1273, 0.1187, 0.0461, 0.0407, 0.0515, 0.0958, 0.0498, -0.1561, 0.1726, -0.006, -0.0262, -0.0106, 0.1623, 0.1477, -0.0509])
> In [7]: 
> {code}
> I think that this is a pyspark-specific error, since I can load the trained model in the scala spark-shell and use findSynonyms:
> {code}
> scala> import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
> import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
> scala> val model = Word2VecModel.load(sc,"word_vec_from_cleaned_query.model")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> 16/01/06 14:17:14 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
> 16/01/06 14:17:14 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
> model: org.apache.spark.mllib.feature.Word2VecModel = org.apache.spark.mllib.feature.Word2VecModel@2e7da886
> scala> model.findSynonyms("white",10)
> res0: Array[(String, Double)] = Array((stylish,0.8347662648041679), (shirt,0.7721922530954246), (stripe,0.7311193884955149), (striped,0.7033047124091971), (buttons,0.6891310548525095), (womens,0.671437501511924), (zaful,0.6659281321485323), (dorateymur,0.6654344754707424), (womenns,0.6637001786899768), (long,0.6573707323598634))
> scala> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org