You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marvin Rösch (Jira)" <ji...@apache.org> on 2022/02/25 08:04:00 UTC

[jira] [Created] (SPARK-38327) JDBC Source with MariaDB connection returns column names as values

Marvin Rösch created SPARK-38327:
------------------------------------

             Summary: JDBC Source with MariaDB connection returns column names as values
                 Key: SPARK-38327
                 URL: https://issues.apache.org/jira/browse/SPARK-38327
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1
         Environment: MariaDB version 10.3.10

Running with spark-k8s-operator
            Reporter: Marvin Rösch


Using a JDBC source with the official MariaDB JDBC driver and a JDBC connection URL like the following does not work as expected:
{noformat}
jdbc:mariadb://db.example.com:3306/schema {noformat}
Assume we have a table "values" like the following in MariaDB:
||id (binary)||name (varchar)||
|0xAB|Name 1|
|0xBC|Name 2|

We intend to create and display a data frame from it like this:
{code:scala}
spark.read
  .format("jdbc")
  .option("url", "jdbc:mariadb://db.example.com:3306/schema")
  .option("dbtable", "values")
  .load()
  .show{code}
*Expected Behavior*

Using such a connection URL on an arbitrary MariaDB table or query results in a data frame that reflects the table structure and content from MariaDB correctly, with columns having the correct type and values.

The output of the above should be
{noformat}
+----+------+
|  id|  name|
+----+------+
|[AB]|Name 1|
|[BC]|Name 2|
+----+------+{noformat}
*Observed Behavior*

Result rows contain column names as values, making them effectively useless to work with.

The actual output is
{noformat}
+-------+----+
|     id|name|
+-------+----+
|[69 64]|name|
|[69 64]|name|
+-------+----+{noformat}
*Further information*

An easy workaround appears to be specifying "mysql" instead of "mariadb" in the connection URL while explicitly specifying the MariaDB driver. I'd expect the mariadb URL to work out of the box, however.

It looks like this has been an issue since at least 2016 according to a [StackOverflow post|https://stackoverflow.com/questions/38808463/incorrect-data-while-loading-jdbc-table-in-spark-sql].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org