You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tian Zhang <tz...@yahoo.com> on 2015/10/14 22:41:58 UTC
Spark streaming checkpoint against s3
Hi, I am trying to set spark streaming checkpoint to s3, here is what I did
basically
val checkpoint = "s3://myBucket/checkpoint"
val ssc = StreamingContext.getOrCreate(checkpointDir,
() =>
getStreamingContext(sparkJobName,
batchDurationSec),
classOf[MyClassKryoRegistrator],
checkpointDir),
getHadoopConfiguration)
def getHadoopConfiguration: Configuration = {
val hadoopConf = new Configuration()
hadoopConf.set("fs.defaultFS", "s3://"+myBucket+"/")
hadoopConf.set("fs.s3.awsAccessKeyId", "myAccessKey")
hadoopConf.set("fs.s3.awsSecretAccessKey", "mySecretKey")
hadoopConf.set("fs.s3n.awsAccessKeyId", "myAccessKey")
hadoopConf.set("fs.s3n.awsSecretAccessKey", "mySecretKey
hadoopConf
}
It is working as I can see that it tries to retrieve checkpoint from s3.
However it did more than what I intended. I saw in the log of the following
15/10/14 19:58:47 ERROR spark.SparkContext: Jar not found at
file:/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar
Now SparkContext is trying to look the following path instead of local
file:/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar
How do I let SparkContext to look just
/media/ephemeral0/oncue/mesos-slave/slaves/20151007-172900-436893194-5050-2984-S9/frameworks/20150825-180042-604730890-5050-4268-0003/executors/tian-act-reg.47368a1a-71f9-11e5-ad61-de5fb3a867da/runs/dfc28a6c-48a0-464b-bdb1-d6dd057acd51/artifacts/rna-spark-streaming.jar?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark streaming checkpoint against s3
Posted by Tian Zhang <tz...@yahoo.com>.
So as long as jar is kept on s3 and available across different runs, then the
s3 checkpoint is working.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark streaming checkpoint against s3
Posted by Tian Zhang <tz...@yahoo.com>.
It looks like that reconstruction of SparkContext from checkpoint data is
trying to look for
the jar file of previous failed runs. It can not find the jar files as our
jar files are on local
machines and were cleaned up after each failed run.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25070.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org