You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shanmugavel Kuttiyandi Chandrakasu (Jira)" <ji...@apache.org> on 2023/01/26 02:30:00 UTC
[jira] [Created] (SPARK-42193) dataframe API filter criteria throwing ParseException when reading a JDBC column name with special characters
Shanmugavel Kuttiyandi Chandrakasu created SPARK-42193:
----------------------------------------------------------
Summary: dataframe API filter criteria throwing ParseException when reading a JDBC column name with special characters
Key: SPARK-42193
URL: https://issues.apache.org/jira/browse/SPARK-42193
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.3.0
Reporter: Shanmugavel Kuttiyandi Chandrakasu
*On Spark 3.3.0,* when reading from a JDBC table(used SQLite to repro) using spark.read.jdbc command with sqlite-jdbc:3.34.0.jar on a table and column name containing special characters. Dataframe API filter criteria fails with parse Exception
*[#Script:]*
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Databricks Support") \
.config("spark.jars.packages", "org.xerial:sqlite-jdbc:3.34.0") \
.getOrCreate()
columns = ["id", "/abc/column", "value"]
data = [(1, 'A', 100), (2, 'B', 200), (3, 'B', 300)]
rdd = spark.sparkContext.parallelize(data)
df = spark.createDataFrame(rdd).toDF(*columns)
options = {"url": "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db", "dbtable": '"/abc/table"', "driver": "org.sqlite.JDBC"}
df.coalesce(1).write.format("jdbc").options(**options).mode("append").save()
df_1 = spark.read.format("jdbc") \
.option("url", "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db") \
.option("dbtable", '"/abc/table"') \
.option("driver", "org.sqlite.JDBC") \
.load()
df_2 = df_1.filter("`/abc/column` = 'B'")
df_2.show() {code}
Error:
{code:java}
``` Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/dataframe.py", line 606, in show
print(self._jdf.showString(n, 20, vertical))
File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.ParseException:
Syntax error at or near '/': extra input '/'(line 1, pos 0)
== SQL ==
/abc/column
^^^``` {code}
However, when using Spark 3.2.1, we are able to successfully apply dataframe.filter option
{code:java}
>>> df_2.show()
+---+-----------+-----+
| id|/abc/column|value|
+---+-----------+-----+
| 2| B| 200|
| 3| B| 300|
+---+-----------+-----+ {code}
Repro steps:
# Download [Spark 3.2.1 in local |https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz]
# Download and Copy the sqlite-jdbc:3.34.0.jar into the jar folder present in the local spark download folder
# Run the above [#script] by providing the jar path
# This will create a */abc/table* with column */abc/column* and returns result when applying filter criteria
# Download spark ** [3.3.0 in local|https://www.apache.org/dyn/closer.lua/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz]
# Repeat #2, #3
# Fails with parse exception.
could you please let us know how we can filter on the special characters column or escape them on spark version 3.3.0?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org