You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/12/03 08:44:39 UTC

Does count() evaluate all mapped functions?

Hi,

I have an RDD and a function that should be called on every item in this
RDD once (say it updates an external database). So far, I used
  rdd.map(myFunction).count()
or
  rdd.mapPartitions(iter => iter.map(myFunction))
but I am wondering if this always triggers the call of myFunction in both
cases. Actually, in the first case, the count() will be the same whether or
not myFunction is called for each element, so I was just wondering if I can
rely on count() evaluating the whole pipeline including functions that
cannot change the count.

Thanks
Tobias