You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/12/18 10:43:03 UTC
Semantics of foreachPartition()
Hi,
I have the following code in my application:
tmpRdd.foreach(item => {
println("abc: " + item)
})
tmpRdd.foreachPartition(iter => {
iter.map(item => {
println("xyz: " + item)
})
})
In the output, I see only the "abc" prints (i.e. from the foreach() call).
(The result is the same also if I exchange the order.) What exactly is the
meaning of foreachPartition and how would I use it correctly?
Thanks
Tobias
Re: Semantics of foreachPartition()
Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi again,
On Thu, Dec 18, 2014 at 6:43 PM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
>
> tmpRdd.foreachPartition(iter => {
> iter.map(item => {
> println("xyz: " + item)
> })
> })
>
Uh, with iter.foreach(...) it works... the reason being apparently that
iter.map() returns itself an iterator, is thus evaluated lazily (in this
case: never), while iter.foreach() is evaluated immediately.
Thanks
Tobias