You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/12 17:16:24 UTC
spark-shell throws JDBC error after load

I have resolved the hanging issue below by using yarn-client as follows

 

spark-shell --master yarn --deploy-mode client --driver-class-path
/home/hduser/jars/ojdbc6.jar 

 

val channels = sqlContext.read.format("jdbc").options(

    Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

    "dbtable" -> "(select * from sh.channels where channel_id = 14)",

    "user" -> "sh",

    "password" -> "sh")).load

channels.show

 

 

But I am getting this error with channels.show

 

 

16/02/12 16:03:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, rhes564, PROCESS_LOCAL, 1929 bytes)

16/02/12 16:03:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on rhes564:33141 (size: 2.7 KB, free: 1589.8 MB)

16/02/12 16:03:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
rhes564): java.sql.SQLException: No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb

        at java.sql.DriverManager.getConnection(DriverManager.java:596)

        at java.sql.DriverManager.getConnection(DriverManager.java:187)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:188)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:181)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCR
DD.scala:360)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scal
a:352)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

        at org.apache.spark.scheduler.Task.run(Task.scala:88)

        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

        at java.lang.Thread.run(Thread.java:724)

 

16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID
1, rhes564, PROCESS_LOCAL, 1929 bytes)

16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 1]

16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID
2, rhes564, PROCESS_LOCAL, 1929 bytes)

16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 2]

16/02/12 16:03:38 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID
3, rhes564, PROCESS_LOCAL, 1929 bytes)

16/02/12 16:03:38 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on
executor rhes564: java.sql.SQLException (No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb) [duplicate 3]

16/02/12 16:03:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job

16/02/12 16:03:38 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have
all completed, from pool

16/02/12 16:03:38 INFO YarnScheduler: Cancelling stage 0

16/02/12 16:03:38 INFO DAGScheduler: ResultStage 0 (show at <console>:26)
failed in 1.182 s

16/02/12 16:03:38 INFO DAGScheduler: Job 0 failed: show at <console>:26,
took 1.316319 s

16/02/12 16:03:39 INFO SparkContext: Invoking stop() from shutdown hook

16/02/12 16:03:39 INFO SparkUI: Stopped Spark web UI at
http://50.140.197.217:4040

16/02/12 16:03:39 INFO DAGScheduler: Stopping DAGScheduler

16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Interrupting monitor
thread

16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Shutting down all
executors

16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Asking each executor to
shut down

16/02/12 16:03:39 INFO YarnClientSchedulerBackend: Stopped

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 12 February 2016 10:45
To: user@spark.apache.org
Subject: Connection via JDBC to Oracle hangs after count call

 

Hi,

 

I use the following to connect to Oracle DB from Spark shell 1.5.2

 

spark-shell --master spark://50.140.197.217:7077 --driver-class-path
/home/hduser/jars/ojdbc6.jar

 

in Scala I do

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@f9d4387
<ma...@f9d4387> 

 

scala> val channels = sqlContext.read.format("jdbc").options(

     |      Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

     |      "dbtable" -> "(select * from sh.channels where channel_id =
14)",

     |      "user" -> "sh",

     |       "password" -> "xxxxxxx")).load

channels: org.apache.spark.sql.DataFrame = [CHANNEL_ID: decimal(0,-127),
CHANNEL_DESC: string, CHANNEL_CLASS: string, CHANNEL_CLASS_ID:
decimal(0,-127), CHANNEL_TOTAL: string, CHANNEL_TOTAL_ID: decimal(0,-127)]

 

scala> channels.count()

 

But the latter command keeps hanging?

 

Any ideas appreciated

 

Thanks,

 

Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.