You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matthew Walton (JIRA)" <ji...@apache.org> on 2017/06/22 13:00:00 UTC
[jira] [Created] (SPARK-21179) Unable to return Hive INT data type into Spark SQL via Hive JDBC driver: Caused by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.

Matthew Walton created SPARK-21179:
--------------------------------------

             Summary: Unable to return Hive INT data type into Spark SQL via Hive JDBC driver:  Caused by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.  
                 Key: SPARK-21179
                 URL: https://issues.apache.org/jira/browse/SPARK-21179
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell, SQL
    Affects Versions: 2.0.0, 1.6.0
         Environment: OS:  Linux
HDP version 2.5.0.1-60
Hive version: 1.2.1
Spark  version 2.0.0.2.5.0.1-60
JDBC:  Download the latest Hortonworks JDBC driver
            Reporter: Matthew Walton


I'm trying to fetch back data in Spark SQL using a JDBC connection to Hive.  Unfortunately, when I try to query data that resides in an INT column I get the following error:  

17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.  

Steps to reproduce:

1) On Hive create a simple table with an INT column and insert some data (I used SQuirreL Client with the Hortonworks JDBC driver):

create table wh2.hivespark (country_id int, country_name string)
insert into wh2.hivespark values (1, 'USA')

2) Copy the Hortonworks Hive JDBC driver to the machine where you will run Spark Shell

3) Start Spark shell loading the Hortonworks Hive JDBC driver jar files

./spark-shell --jars /home/spark/jdbc/hortonworkshive/HiveJDBC41.jar,/home/spark/jdbc/hortonworkshive/TCLIServiceClient.jar,/home/spark/jdbc/hortonworkshive/commons-codec-1.3.jar,/home/spark/jdbc/hortonworkshive/commons-logging-1.1.1.jar,/home/spark/jdbc/hortonworkshive/hive_metastore.jar,/home/spark/jdbc/hortonworkshive/hive_service.jar,/home/spark/jdbc/hortonworkshive/httpclient-4.1.3.jar,/home/spark/jdbc/hortonworkshive/httpcore-4.1.3.jar,/home/spark/jdbc/hortonworkshive/libfb303-0.9.0.jar,/home/spark/jdbc/hortonworkshive/libthrift-0.9.0.jar,/home/spark/jdbc/hortonworkshive/log4j-1.2.14.jar,/home/spark/jdbc/hortonworkshive/ql.jar,/home/spark/jdbc/hortonworkshive/slf4j-api-1.5.11.jar,/home/spark/jdbc/hortonworkshive/slf4j-log4j12-1.5.11.jar,/home/spark/jdbc/hortonworkshive/zookeeper-3.4.6.jar

4) In Spark shell load the data from Hive using the JDBC driver

val hivespark = spark.read.format("jdbc").options(Map("url" -> "jdbc:hive2://localhost:10000/wh2;AuthMech=3;UseNativeQuery=1;user=hfs;password=hdfs","dbtable" -> "wh2.hivespark")).option("driver","com.simba.hive.jdbc41.HS2Driver").option("user","hdfs").option("password","hdfs").load()

5) In Spark shell try to display the data

hivespark.show()

At this point you should see the error:

scala> hivespark.show()
17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.
        at com.simba.hiveserver2.exceptions.ExceptionConverter.toSQLException(Unknown Source)
        at com.simba.hiveserver2.utilities.conversion.TypeConverter.toInt(Unknown Source)
        at com.simba.hiveserver2.jdbc.common.SForwardResultSet.getInt(Unknown Source)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:437)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:535)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:85)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Note:  I also tested this issue using a JDBC driver from Progress DataDirect and I see a similar error message so this does not seem to be driver specific.

scala> hivespark.show()
17/06/22 12:07:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.sql.SQLException: [DataDirect][Hive JDBC Driver]Value can not be converted to requested type.

Also, if I query this table directly from SQuirreL Client tool there is no error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org