You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Michael Albert <m_...@yahoo.com.INVALID> on 2015/10/02 18:33:21 UTC

are functions deserialized once per task?

Greetings!
Is it true that functions, such as those passed to RDD.map(), are deserialized once per task?This seems to be the case looking at Executor.scala, but I don't really understand the code.
I'm hoping the answer is yes because that makes it easier to write code without worrying about thread safety.For example, suppose I have something like this:class FooToBarTransformer{   def transform(foo: Foo): Bar = .....}
Now I want to do something like this:val rddFoo : RDD[FOO] = ....val transformer = new TransformerrddBar = rddFoo.map( foo => transformer.transform(foo))
If the "transformer" object is deserialized once per task, then I do not need to worry whether "transform()" is thread safe.If, for example, the implementation tried "optimize" matters by caching the deserialization, so that one object was sharedby all threads in a single JVM, then presumably one would need to worry about the thread safety of transform().
Is my understanding correct?Is this likely to continue to be true in future releases?Answers of "yes" would be much appreciated :-).
Thanks!-Mike