You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/13 17:20:20 UTC

RE: jdbc driver used by spark fails following first stage, solved it

Like many things it is not that straight forward!

 

Need to explicitly reference oracle jar file with switch  -jars

 

spark-shell --master yarn --deploy-mode client --driver-class-path
/home/hduser/jars/ojdbc6.jar --jars /home/hduser/jars/ojdbc6.jar

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 13 February 2016 15:25
To: user@spark.apache.org
Subject: jdbc driver used by spark fails folowing first stage

 

Hi,

 

My spark shell I start with --driver-class-path /home/hduser/jars/ojdbc6.jar

 

It finds the driver as any Map read reads the correct structure for the
Oracle tables. Even when I join columns I can see the join structure:

 

scala> empDepartments.printSchema()

root

|-- DEPARTMENT_ID: decimal(4,0) (nullable = false)

|-- DEPARTMENT_NAME: string (nullable = false)

|-- MANAGER_ID: decimal(6,0) (nullable = true)

|-- LOCATION_ID: decimal(4,0) (nullable = true)

|-- DEPARTMENT_ID: decimal(4,0) (nullable = false)

|-- DEPARTMENT_NAME: string (nullable = false)

|-- MANAGER_ID: decimal(6,0) (nullable = true)

|-- LOCATION_ID: decimal(4,0) (nullable = true)

 

 

Howver, any operation dealing the rows themselves fail as shown below. 

 

scala> empDepartments.foreach(println)

 

16/02/13 15:32:56 INFO SparkContext: Starting job: foreach at <console>:37

16/02/13 15:32:56 INFO DAGScheduler: Got job 5 (foreach at <console>:37)
with 200 output partitions

16/02/13 15:32:56 INFO DAGScheduler: Final stage: ResultStage 11(foreach at
<console>:37)

16/02/13 15:32:56 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 9, ShuffleMapStage 10)

16/02/13 15:32:56 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
9, ShuffleMapStage 10)

16/02/13 15:32:56 INFO DAGScheduler: Submitting ShuffleMapStage 9
(MapPartitionsRDD[12] at foreach at <console>:37), which has no missing
parents

16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(8136) called with
curMem=44967, maxMem=555684986

16/02/13 15:32:57 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 7.9 KB, free 529.9 MB)

16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(3976) called with
curMem=53103, maxMem=555684986

16/02/13 15:32:57 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes
in memory (estimated size 3.9 KB, free 529.9 MB)

16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on 50.140.197.217:40741 (size: 3.9 KB, free: 529.9 MB)

16/02/13 15:32:57 INFO SparkContext: Created broadcast 7 from broadcast at
DAGScheduler.scala:861

16/02/13 15:32:57 INFO DAGScheduler: Submitting 1 missing tasks from
ShuffleMapStage 9 (MapPartitionsRDD[12] at foreach at <console>:37)

16/02/13 15:32:57 INFO YarnScheduler: Adding task set 9.0 with 1 tasks

16/02/13 15:32:57 INFO DAGScheduler: Submitting ShuffleMapStage 10
(MapPartitionsRDD[8] at foreach at <console>:37), which has no missing
parents

16/02/13 15:32:57 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID
27, rhes564, PROCESS_LOCAL, 1918 bytes)

16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(8136) called with
curMem=57079, maxMem=555684986

16/02/13 15:32:57 INFO MemoryStore: Block broadcast_8 stored as values in
memory (estimated size 7.9 KB, free 529.9 MB)

16/02/13 15:32:57 INFO MemoryStore: ensureFreeSpace(3978) called with
curMem=65215, maxMem=555684986

16/02/13 15:32:57 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes
in memory (estimated size 3.9 KB, free 529.9 MB)

16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on 50.140.197.217:40741 (size: 3.9 KB, free: 529.9 MB)

16/02/13 15:32:57 INFO SparkContext: Created broadcast 8 from broadcast at
DAGScheduler.scala:861

16/02/13 15:32:57 INFO DAGScheduler: Submitting 1 missing tasks from
ShuffleMapStage 10 (MapPartitionsRDD[8] at foreach at <console>:37)

16/02/13 15:32:57 INFO YarnScheduler: Adding task set 10.0 with 1 tasks

16/02/13 15:32:57 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID
28, rhes564, PROCESS_LOCAL, 1918 bytes)

16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on rhes564:23270 (size: 3.9 KB, free: 1589.7 MB)

16/02/13 15:32:57 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on rhes564:23270 (size: 3.9 KB, free: 1589.7 MB)

16/02/13 15:32:57 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 27,
rhes564): java.sql.SQLException: No suitable driver found for
jdbc:oracle:thin:@rhes564:1521:mydb

        at java.sql.DriverManager.getConnection(DriverManager.java:596)

        at java.sql.DriverManager.getConnection(DriverManager.java:187)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:188)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnecto
r$1.apply(JDBCRDD.scala:181)

        at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCR
DD.scala:360)

 

 

Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.