You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Markus Losoi <ma...@gmail.com> on 2013/11/04 21:22:05 UTC

Receiving intermediary results of a Spark operation

Hi

Is it possible for a driver program to receive intermediary results of a
Spark operation? If, e.g., a long map() operation is in progress, can the
driver become aware of some of the (key, value) pairs before all of them are
computed?

There seems to be SparkListener interface that has an onTaskEnd() event [1].
However, the documentation is somewhat sparse on what kind of information is
included in a SparkListenerTaskEnd object [2].

[1]
http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
uler/SparkListener.html
[2]
http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
uler/SparkListenerTaskEnd.html

Best regards,
Markus Losoi (markus.losoi@gmail.com)


Re: Receiving intermediary results of a Spark operation

Posted by Horia <ho...@alum.berkeley.edu>.
I may be wrong here... It seems to me that the use of such functionality is
contrary to the paradigm that Spark enforces.

Here is why I say that:    Spark doesn't execute transformations, such as
'map', until an action is requested, such as 'persist'. Therefore,
explicitly performing computations between partially competed chunks of a
'map' call seems counter to the Spark MO.

-- Horia
 On Nov 4, 2013 12:22 PM, "Markus Losoi" <ma...@gmail.com> wrote:

> Hi
>
> Is it possible for a driver program to receive intermediary results of a
> Spark operation? If, e.g., a long map() operation is in progress, can the
> driver become aware of some of the (key, value) pairs before all of them
> are
> computed?
>
> There seems to be SparkListener interface that has an onTaskEnd() event
> [1].
> However, the documentation is somewhat sparse on what kind of information
> is
> included in a SparkListenerTaskEnd object [2].
>
> [1]
>
> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
> uler/SparkListener.html
> [2]
>
> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
> uler/SparkListenerTaskEnd.html
>
> Best regards,
> Markus Losoi (markus.losoi@gmail.com)
>
>

Re: Receiving intermediary results of a Spark operation

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Sorry, meant to include the link:
http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#accumulators



On Mon, Nov 4, 2013 at 1:15 PM, Mark Hamstra <ma...@clearstorydata.com>wrote:

> You probably want to be looking at accumulators.
>
>
> On Mon, Nov 4, 2013 at 12:22 PM, Markus Losoi <ma...@gmail.com>wrote:
>
>> Hi
>>
>> Is it possible for a driver program to receive intermediary results of a
>> Spark operation? If, e.g., a long map() operation is in progress, can the
>> driver become aware of some of the (key, value) pairs before all of them
>> are
>> computed?
>>
>> There seems to be SparkListener interface that has an onTaskEnd() event
>> [1].
>> However, the documentation is somewhat sparse on what kind of information
>> is
>> included in a SparkListenerTaskEnd object [2].
>>
>> [1]
>>
>> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
>> uler/SparkListener.html<http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/scheduler/SparkListener.html>
>> [2]
>>
>> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
>> uler/SparkListenerTaskEnd.html<http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/scheduler/SparkListenerTaskEnd.html>
>>
>> Best regards,
>> Markus Losoi (markus.losoi@gmail.com)
>>
>>
>

Re: Receiving intermediary results of a Spark operation

Posted by Mark Hamstra <ma...@clearstorydata.com>.
You probably want to be looking at accumulators.


On Mon, Nov 4, 2013 at 12:22 PM, Markus Losoi <ma...@gmail.com>wrote:

> Hi
>
> Is it possible for a driver program to receive intermediary results of a
> Spark operation? If, e.g., a long map() operation is in progress, can the
> driver become aware of some of the (key, value) pairs before all of them
> are
> computed?
>
> There seems to be SparkListener interface that has an onTaskEnd() event
> [1].
> However, the documentation is somewhat sparse on what kind of information
> is
> included in a SparkListenerTaskEnd object [2].
>
> [1]
>
> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
> uler/SparkListener.html
> [2]
>
> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/sched
> uler/SparkListenerTaskEnd.html
>
> Best regards,
> Markus Losoi (markus.losoi@gmail.com)
>
>