You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Pralabh Kumar <pr...@gmail.com> on 2020/12/11 06:11:32 UTC
Unable to pickle pySpark PipelineModel
Hi Dev , User
I want to store spark ml model in databases , so that I can reuse them
later on . I am
unable to pickle them . However while using scala I am able to convert them
into byte
array stream .
So for .eg I am able to do something below in scala but not in python
val modelToByteArray = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(modelToByteArray)
oos.writeObject(model)
oos.close()
oos.flush()
spark.sparkContext.parallelize(Seq((model.uid, "my-neural-network-model",
modelToByteArray.toByteArray)))
.saveToCassandra("dfsdfs", "models", SomeColumns("uid", "name", "model")
But pickle.dumps(model) in pyspark throws error
cannot pickle '_thread.RLock' object
Please help on the same
Regards
Pralabh