You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Jan Brabec (janbrabe)" <ja...@cisco.com> on 2019/03/12 09:39:25 UTC

Mutating broadcast variable from executors, any risks even if done in a thread-safe manner?

Hello,

I have quite specific usecase. I want to use an MXNet neural-net model in a distributed fashion to get predictions on a very large dataset. It is not possible to broadcast the model directly because the underlying implementation is not serializable. Instead the model has to be loaded directly at the executors. What we do at the moment (and it works), is that we broadcast a wrapper class and the model is loaded inside to a lazy val on a first use. This is nice because we do not need to load the model for each partition but only for each executor, thus making the job more efficient. However, because we are updating lazy val we are mutating the broadcasted variable on the executors in a thread-safe manner.

I understand that broadcast was meant to broadcast immutable values, but that is simply not convenient for us. Are there any risks to what we do, are wee shooting ourselves to the foot and is there a better way how to achieve what we want?

Best
Jan