You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/12/18 10:43:03 UTC

Semantics of foreachPartition()

Hi,

I have the following code in my application:

        tmpRdd.foreach(item => {
          println("abc: " + item)
        })
        tmpRdd.foreachPartition(iter => {
          iter.map(item => {
            println("xyz: " + item)
          })
        })

In the output, I see only the "abc" prints (i.e. from the foreach() call).
(The result is the same also if I exchange the order.) What exactly is the
meaning of foreachPartition and how would I use it correctly?

Thanks
Tobias

Re: Semantics of foreachPartition()

Posted by Tobias Pfeiffer <tg...@preferred.jp>.

Hi again,

On Thu, Dec 18, 2014 at 6:43 PM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
>
>         tmpRdd.foreachPartition(iter => {
>           iter.map(item => {
>             println("xyz: " + item)
>           })
>         })
>

Uh, with iter.foreach(...) it works... the reason being apparently that
iter.map() returns itself an iterator, is thus evaluated lazily (in this
case: never), while iter.foreach() is evaluated immediately.

Thanks
Tobias