You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by anupamme <me...@gmail.com> on 2015/03/04 07:16:29 UTC

how to save Word2VecModel

Hello

I started using spark. I am working with Word2VecModel. However I am not
able to save the trained model. Here is what I am doing:

inp = sc.textFile("/Users/mediratta/code/word2vec/trunk-d/sub-5").map(lambda
row: row.split(" "))
word2vec = Word2Vec()
model = word2vec.fit(inp)
out = open('abc.bin', 'wb')
pickle.dump(model, out, pickle.HIGHEST_PROTOCOL)

But I get error:

"It appears that you are attempting to reference SparkContext from a
broadcast "
Exception: It appears that you are attempting to reference SparkContext from
a broadcast variable, action, or transforamtion. SparkContext can only be
used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.

However when I run pickle.dump of argument of type list instead of
Word2VecModel, then pickle.dumps works fine.

So seems the error is coming because of the type of the first argument
(Word2VecModel in this case). However the error message seems misleading.

Any clue what I am doing wrong?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-save-Word2VecModel-tp21900.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: how to save Word2VecModel

Posted by Xiangrui Meng <me...@gmail.com>.

There is a JIRA for this:
https://issues.apache.org/jira/browse/SPARK-5692, a subtask of
SPARK-4587. We implemented save/load for linear models, trees, and ALS
in Spark 1.3, but we don't support Word2VecModel yet. As a hack, you
can try Java serialization:

sc.parallelize(Seq(model), 1).saveAsObjectFile("path")
val sameModel = sc.objectFile[Word2VecModel]("path").first()


On Tue, Mar 3, 2015 at 10:16 PM, anupamme <me...@gmail.com> wrote:
> Hello
>
> I started using spark. I am working with Word2VecModel. However I am not
> able to save the trained model. Here is what I am doing:
>
> inp = sc.textFile("/Users/mediratta/code/word2vec/trunk-d/sub-5").map(lambda
> row: row.split(" "))
> word2vec = Word2Vec()
> model = word2vec.fit(inp)
> out = open('abc.bin', 'wb')
> pickle.dump(model, out, pickle.HIGHEST_PROTOCOL)
>
> But I get error:
>
> "It appears that you are attempting to reference SparkContext from a
> broadcast "
> Exception: It appears that you are attempting to reference SparkContext from
> a broadcast variable, action, or transforamtion. SparkContext can only be
> used on the driver, not in code that it run on workers. For more
> information, see SPARK-5063.
>
> However when I run pickle.dump of argument of type list instead of
> Word2VecModel, then pickle.dumps works fine.
>
> So seems the error is coming because of the type of the first argument
> (Word2VecModel in this case). However the error message seems misleading.
>
> Any clue what I am doing wrong?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-save-Word2VecModel-tp21900.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org