You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Aaron <aa...@gmail.com> on 2015/07/20 19:14:13 UTC

Spark 1.4.1,MySQL and DataFrameReader.read.jdbc fun

I have Spark 1.4.1, running on a YARN cluster.  When I do a pyspark,
in yarn-client mode:

pyspark --jars ~/dev/spark/lib/mysql-connector-java-5.1.36-bin.jar
--driver-class-path
~/dev/spark/lib/mysql-connector-java-5.1.36-bin.jar

and then do the equivalent of..

tbl = sqlContext.read.jdbc("jdbc:mysql://....", "tableName",
properties={"user":"blah", "password":"pw"})


I get the "No Suitable Driver found" error when I attempted to do a

tbl.show()  or maybe a tbl.describe(), etc.  This even happens in the
spark-shell too.


Currently I do NOT use the SPARK_CLASSPATH (as I've seen that talked
about and knowing that it is deprecated).  I also do NOT set the
spark.executor.extraClassPath property because I thought that was the
whole point of --jars option.

So, do I need to deploy the mysql connector to a known location on my
YARN node managers, and then reference that JAR location someplace?
If so, what cmd line options do I use, or properties do I set?

I thought the --jars cmd line option put the JARs into the class path
to be used, is this not the case?

Another question, why do I need --driver-class-path <location of mysql
jar>?  If I don't use this cmd line option, I get an error just
attempting to do the sqlContext.read.jdbc() assignment..not trying to
perform an operation on the RDD.


Cheers,
Aaron

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org