You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by saurabh3d <sa...@oracle.com> on 2016/11/03 08:14:29 UTC
How to join dstream and JDBCRDD with checkpointing enabled
Hi All,
We have a spark streaming job with checkpoint enabled, it executes correctly
first time, but throw below exception when restarted from checkpoint.
org.apache.spark.SparkException: RDD transformations and actions can only be
invoked by the driver, not inside of other transformations; for example,
rdd1.map(x => rdd2.values.count() * x) is invalid because the values
transformation and count action cannot be performed inside of the rdd1.map
transformation. For more information, see SPARK-5063.
at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:87)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:352)
at org.apache.spark.rdd.RDD.union(RDD.scala:565)
at
org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:23)
at
org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:19)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
Please suggest any workaround for this issue.
Code:
String URL = "jdbc:oracle:thin:" + USERNAME + "/" + PWD + "@//" +
CONNECTION_STRING;
Map<String, String> options = ImmutableMap.of(
"driver", "oracle.jdbc.driver.OracleDriver",
"url", URL,
"dbtable", "READINGS_10K",
"fetchSize", "10000");
DataFrame OracleDB_DF = sqlContext.load("jdbc", options);
JavaPairRDD<String, Row> OracleDB_RDD = OracleDB_DF.toJavaRDD()
.mapToPair(x -> new Tuple2(x.getString(0), x));
Dstream
.transformToPair(
rdd -> rdd
.mapToPair(
record ->
new Tuple2<>(
record.getKey().toString(),
record))
.join(OracleDB_RDD))
.print();
Spark version 1.6, running in yarn cluster mode.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-dstream-and-JDBCRDD-with-checkpointing-enabled-tp28001.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org