You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Dushyant Rajput <du...@gmail.com> on 2016/02/23 22:46:41 UTC

Fwd: HANA data access from SPARK

Hi,

I am writting a python app to load data from SAP HANA.

dfr = DataFrameReader(sqlContext)
df =
dfr.jdbc(url='jdbc:sap://ip_hana:30015/?user=<user>&password=<pwd>',table=table)
df.show()

It throws a
 serialization error
:

y4j.protocol.Py4JJavaError: An error occurred while calling o59.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
not serializable: java.io.NotSerializableException:
com.sap.db.jdbc.topology.Host
Serialization stack:
- object not serializable (class: com.sap.db.jdbc.topology.Host, value:
<ip>:30015)
- writeObject data (class: java.util.ArrayList)
- object (class java.util.ArrayList, [])
- writeObject data (class: java.util.Hashtable)
- field (class:
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1,
name: properties$1, type: class java.util.Properties)
- object (class
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1,
<function0>)
- field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,
name:
org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getConnection,
type: interface scala.Function0)
- object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,
JDBCRDD[5] at showString at NativeMethodAccessorImpl.java:-2)
- field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class
org.apache.spark.rdd.RDD)
- object (class org.apache.spark.OneToOneDependency,
org.apache.spark.OneToOneDependency@57931c92)
- writeObject data (class: scala.collection.immutable.$colon$colon)
- object (class scala.collection.immutable.$colon$colon,
List(org.apache.spark.OneToOneDependency@57931c92))
- field (class: org.apache.spark.rdd.RDD, name:
org$apache$spark$rdd$RDD$$dependencies_, type: interface
scala.collection.Seq)
- object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[6]
at showString at NativeMethodAccessorImpl.java:-2)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (MapPartitionsRDD[6] at showString at
NativeMethodAccessorImpl.java:-2,<function2>))
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:865)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:757)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1466)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
at
org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)

Rgds,
Dushyant.

Re: Fwd: HANA data access from SPARK

Posted by whitefalcon <ar...@hotmail.com>.

Hi Dushyant,

I saw this same error with an older Hana JDBC driver, but the error went
away when I tried a later ngdbc.jar driver file  (dated May 2016). I've not
tried to 

Heres an example I did using the later driver with Spark 1.6.2 running
standalone.
http://scn.sap.com/community/hana-in-memory/blog/2016/09/09/calling-hana-views-from-apache-spark

Regards
Aron



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-HANA-data-access-from-SPARK-tp16412p18921.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org