You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Pralabh Kumar <pr...@gmail.com> on 2020/12/11 06:11:32 UTC

Unable to pickle pySpark PipelineModel

Hi Dev , User

I want to store spark ml model in databases , so that I can reuse them
later on .  I am
unable to pickle them . However while using scala I am able to convert them
into byte
array stream .

So for .eg I am able to do something below in scala but not in python

 val modelToByteArray = new ByteArrayOutputStream()
 val oos = new ObjectOutputStream(modelToByteArray)
 oos.writeObject(model)
 oos.close()
 oos.flush()

spark.sparkContext.parallelize(Seq((model.uid, "my-neural-network-model",
modelToByteArray.toByteArray)))
   .saveToCassandra("dfsdfs", "models", SomeColumns("uid", "name", "model")


But pickle.dumps(model) in pyspark throws error

cannot pickle '_thread.RLock' object


Please help on the same


Regards

Pralabh