You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/12 17:16:24 UTC
spark-shell throws JDBC error after load
I have resolved the hanging issue below by using yarn-client as follows
spark-shell --master yarn --deploy-mode client --driver-class-path
/home/hduser/jars/ojdbc6.jar
val channels = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
"dbtable" -> "(select * from sh.channels where channel_id = 14)",
"user" -> "sh",
"password" -> "sh")).load
channels.show
But I am getting this error with channels.show
16/02/12 16:03:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, rhes564, PROCESS_LOCAL, 1929 bytes)
16/02/12 16:03:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on rhes564:33141 (size: 2.7 KB, free: 1589.8 MB)
16/02/12 16:03:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
rhes564): java.sql.SQLException: No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:188)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:181)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCR
DD.scala:360)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scal
a:352)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:724)
16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID
1, rhes564, PROCESS_LOCAL, 1929 bytes)
16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 1]
16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID
2, rhes564, PROCESS_LOCAL, 1929 bytes)
16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 2]
16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID
3, rhes564, PROCESS_LOCAL, 1929 bytes)
16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 3]
16/02/12 16:03:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job
16/02/12 16:03:38 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have
all completed, from pool
16/02/12 16:03:38 INFO YarnScheduler: Cancelling stage 0
16/02/12 16:03:38 INFO DAGScheduler: ResultStage 0 (show at <console>:26)
failed in 1.182 s
16/02/12 16:03:38 INFO DAGScheduler: Job 0 failed: show at <console>:26,
took 1.316319 s
16/02/12 16:03:39 INFO SparkContext: Invoking stop() from shutdown hook
16/02/12 16:03:39 INFO SparkUI: Stopped Spark web UI at
http://50.140.197.217:4040
16/02/12 16:03:39 INFO DAGScheduler: Stopping DAGScheduler
16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Interrupting monitor
thread
16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Shutting down all
executors
16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Asking each executor to
shut down
16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Stopped
Dr Mich Talebzadeh
LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw
<http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.
From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
Sent: 12 February 2016 10:45
To: user@spark.apache.org
Subject: Connection via JDBC to Oracle hangs after count call
Hi,
I use the following to connect to Oracle DB from Spark shell 1.5.2
spark-shell --master spark://50.140.197.217:7077 --driver-class-path
/home/hduser/jars/ojdbc6.jar
in Scala I do
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@f9d4387
<ma...@f9d4387>
scala> val channels = sqlContext.read.format("jdbc").options(
| Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
| "dbtable" -> "(select * from sh.channels where channel_id =
14)",
| "user" -> "sh",
| "password" -> "xxxxxxx")).load
channels: org.apache.spark.sql.DataFrame = [CHANNEL_ID: decimal(0,-127),
CHANNEL_DESC: string, CHANNEL_CLASS: string, CHANNEL_CLASS_ID:
decimal(0,-127), CHANNEL_TOTAL: string, CHANNEL_TOTAL_ID: decimal(0,-127)]
scala> channels.count()
But the latter command keeps hanging?
Any ideas appreciated
Thanks,
Mich Talebzadeh
LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw
<http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.