You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by silvermast <vt...@paxata.com> on 2014/07/09 09:58:32 UTC

TaskContext stageId = 0

Has anyone else seen this, at least in local mode? I haven't tried this in
the cluster, but I'm getting myself frustrated that I cannot ID activity
within the RDD's compute() method whether by stageId or rddId (available on
ParallelCollectionPartition but not on ShuffledRDDPartition, and then only
through reflection). Anyone else solving this problem?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskContext-stageId-0-tp9152.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: TaskContext stageId = 0

Posted by silvermast <vt...@paxata.com>.
Oh well, never mind. The problem is that ResultTask's stageId is immutable
and is used to construct the Task superclass. Anyway, my solution now is to
use this.id for the rddId and to gather all rddIds using a spark listener on
stage completed to clean up for any activity registered for those rdds. I
could use TaskContext's hook but I'd have to add some more messaging so I
can clear state that may live on a different executor than the one my
partition is on, but since I don't know that the executor will succeed, this
is not safe.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskContext-stageId-0-tp9152p9162.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.