You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tian Zhang <tz...@yahoo.com> on 2015/10/14 22:41:58 UTC

Spark streaming checkpoint against s3

Hi, I am trying to set spark streaming checkpoint to s3, here is what I did
basically

    val checkpoint = "s3://myBucket/checkpoint"
    val ssc = StreamingContext.getOrCreate(checkpointDir,
                                           () =>
getStreamingContext(sparkJobName,
                                                                                   
batchDurationSec),
                                                                                  
classOf[MyClassKryoRegistrator],
                                                                                  
checkpointDir),
                                                                                  
getHadoopConfiguration) 
  
  def getHadoopConfiguration: Configuration = {
    val hadoopConf = new Configuration()
    hadoopConf.set("fs.defaultFS", "s3://"+myBucket+"/")
    hadoopConf.set("fs.s3.awsAccessKeyId", "myAccessKey")
    hadoopConf.set("fs.s3.awsSecretAccessKey", "mySecretKey")
    hadoopConf.set("fs.s3n.awsAccessKeyId", "myAccessKey")
    hadoopConf.set("fs.s3n.awsSecretAccessKey", "mySecretKey
    hadoopConf
   }

It is working as I can see that it tries to retrieve checkpoint from s3. 

However it did more than what I intended.  I saw in the log of the following
15/10/14 19:58:47 ERROR spark.SparkContext: Jar not found at
file:/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar

Now SparkContext is trying to look the following path instead of local

file:/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar

How do I let SparkContext to look just
/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar?






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark streaming checkpoint against s3

Posted by Tian Zhang <tz...@yahoo.com>.

So as long as jar is kept on s3 and available across different runs, then the
s3 checkpoint is working.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark streaming checkpoint against s3

Posted by Tian Zhang <tz...@yahoo.com>.

It looks like that reconstruction of SparkContext from checkpoint data is
trying to look for 
the jar file of previous failed runs.  It can not find the jar files as our
jar files are on local 
machines and were cleaned up after each failed run.








--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25070.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org