You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jesús Ricardo Ballesteros Molina (Jira)" <ji...@apache.org> on 2021/08/02 15:02:00 UTC

[jira] [Commented] (SPARK-36325) Writing to hiveserver throught jdbc throws ParseException

    [ https://issues.apache.org/jira/browse/SPARK-36325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391638#comment-17391638 ] 

Jesús Ricardo Ballesteros Molina commented on SPARK-36325:
----------------------------------------------------------

Hello, firstable thank you for your reply. I used the dialect but now I have another issue, and this one I don't know how to address it.

 

 
{code:java}
import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect}
import org.apache.spark.sql.types.StringType
import java.sql.Types
import org.apache.spark.sql.types.DataType
val HiveDialect = new JdbcDialect { 
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || url.contains("hive2")
override def quoteIdentifier(colName: String): String ={ s"$colName" }
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
 case StringType => Option(JdbcType("STRING", Types.VARCHAR))
 case _ => None
 }
}

JdbcDialects.registerDialect(HiveDialect)
df_linux.write.mode("overwrite")
 .format("jdbc")
 .option("driver","org.apache.hive.jdbc.HiveDriver")
 .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:10000")
 .option("dbtable", "o365new")
 //.option("createTableColumnTypes", "_time VARCHAR(1024), raw_log VARCHAR(1024), service_name VARCHAR(1024), hostname VARCHAR(1024), pid VARCHAR(1024), username VARCHAR(1024), source_ip VARCHAR(1024)")
 .option("createTableColumnTypes", "time STRING, raw_log STRING, service_name STRING, hostname STRING, pid STRING, username STRING, source_ip STRING")
 .save()
{code}
 

I get this error:

 
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 423) (10.103.0.118 executor 2): java.sql.SQLFeatureNotSupportedException: Method not supported
 at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:78)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:683)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
 
{code}
 

Maybe jdbc is not the way to write throught thriftserver but I don't know how to do it. At the moment I am using another database but I really want to use the SparkSQL. If you think I should close this issue and maybe open as something else feel free to close the ticket.

 

 

 

> Writing to hiveserver throught jdbc throws ParseException
> ---------------------------------------------------------
>
>                 Key: SPARK-36325
>                 URL: https://issues.apache.org/jira/browse/SPARK-36325
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>         Environment: OS: Debian 10
> Spark version: 3.1.2
> Zeppelin Notebook: 0.9.0
> Jdbc driver:  org.apache.hive:hive-jdbc:3.1.2  
>            Reporter: Jesús Ricardo Ballesteros Molina
>            Priority: Major
>              Labels: spark, spark-sql
>
> Hello everyone, I am new working on Spark and this is my first post. If I make a mistake please be kind to me but I have searched in the web and I haven't found anything related. If this bug is duplicated or something please feel free to close it and tell me where to look. 
> I am working with Zeppelin, I got a dataframe from Solr API, I processed and I want to write to a table trough thrift  and read that new table from Apache SuperSet.
>  
> I have this df with this schema:
> {code:java}
> %spark
> df_linux.printSchema()
> root
>  |-- time: string (nullable = false)
>  |-- raw_log: string (nullable = false)
>  |-- service_name: string (nullable = false)
>  |-- hostname: string (nullable = false)
>  |-- pid: string (nullable = false)
>  |-- username: string (nullable = false)
>  |-- source_ip: string (nullable = false)
> {code}
>  
> And this content:
>  
> {code:java}
> %spark
> df_linux.show()
> +--------------------+--------------------+------------+------------------+-----+--------+---------+
> | time| raw_log|service_name| hostname| pid|username|source_ip|
> +--------------------+--------------------+------------+------------------+-----+--------+---------+
> |2021-07-28T07:41:53Z|Jul 28 07:41:52 s...| sshd[11611]|sa3secessuperset01|11611| debian| 10.0.9.3|
> |2021-07-28T07:41:44Z|Jul 28 07:41:43 s...| sshd[11590]|sa3secessuperset01|11590| debian| 10.0.9.3|
> |2021-07-27T08:46:11Z|Jul 27 08:46:10 s...| sshd[16954]|sa3secessuperset01|16954| debian| 10.0.9.3|
> |2021-07-27T08:44:55Z|Jul 27 08:44:54 s...| sshd[16511]|sa3secessuperset01|16511| debian| 10.0.9.3|
> |2021-07-27T08:30:03Z|Jul 27 08:30:02 s...| sshd[14511]|sa3secessuperset01|14511| debian| 10.0.9.3|
> +--------------------+--------------------+------------+------------------+-----+--------+---------+
> {code}
>  
> When I write the dataframe through jdbc I got this error:
>  
>  
> {code:java}
> df_linux.write.mode("overwrite")
>  .format("jdbc")
>  .option("driver","org.apache.hive.jdbc.HiveDriver")
>  .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:10000")
>  .option("dbtable", "o365new")
>  .option("createTableColumnTypes", "time VARCHAR(1024) NOT NULL, raw_log VARCHAR(1024) NOT NULL, service_name VARCHAR(1024), hostname VARCHAR(1024), pid VARCHAR(1024), username VARCHAR(1024), source_ip STRING")
>  .save()
> java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException:
> no viable alternative at input '("time"'(line 1, pos 22)== SQL ==
> CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL)
> ----------------------^^^{code}
>  
> What I have seen it's the way it tries to create the table. If you run the generated SQL sentence in Beeline it would throw exactly the same error:
>  
> {code:java}
> %hive
> CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL);
> org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
> no viable alternative at input '("time"'(line 2, pos 22{code}
>  
>  
> Then I just removed the quotes and the table is created without any problem:
>  
> {code:java}
> %hive
> CREATE TABLE o365new (time varchar(1024) NOT NULL, raw_log varchar(1024) NOT NULL, service_name varchar(1024) NOT NULL, hostname varchar(1024) NOT NULL, pid varchar(1024) NOT NULL, username varchar(1024) NOT NULL, source_ip string NOT NULL)
> {code}
>  
>  
> So, the problem are the quotes, that's why I think this is a bug, but I don't know how to "override" the query like I do with "createTableColumnTypes". Maybe this is not the way to work and there is another approach but I don't know how to.
>  
> Best regards.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org