You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Stephen Durfey (Jira)" <ji...@apache.org> on 2022/08/09 20:33:00 UTC

[jira] [Created] (SPARK-40024) PostgresDialect Doesn't handle arrays of custom data types after postgresql driver version 42.2.22

Stephen Durfey created SPARK-40024:
--------------------------------------

             Summary: PostgresDialect Doesn't handle arrays of custom data types after postgresql driver version 42.2.22
                 Key: SPARK-40024
                 URL: https://issues.apache.org/jira/browse/SPARK-40024
             Project: Spark
          Issue Type: Task
          Components: SQL
    Affects Versions: 3.1.1
            Reporter: Stephen Durfey


Starting in version 42.2.23 (also 42.3.x and 42.4.x), the sql type returned by the postgresql driver is now `ARRAY` with columns with an array of custom data types (e.g. an array of enums). Prior to this version the driver returned the type `CUSTOM`. PostgresDialect can handle custom types and array types, but not array of custom types. Tthe type support within arrays is limited to what is listed in the catalyst type here: [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L69-L98.] Since a custom type won't match any of those, `None` is returned and eventually `JdbcUtils` throws this exception:

```

java.sql.SQLException: Unsupported type ARRAY
  at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedJdbcTypeError(QueryExecutionErrors.scala:682)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:249)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:327)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:327)

```

The postgresql driver change was part of [Issue#1948|[https://github.com/pgjdbc/pgjdbc/issues/1948].]

 

I did make a change locally and returned `StringType` instead of `None` for the default case, and that worked fine, but I don't know if that's the desired solution or not.

I created a gist with a code snippet to recreate the issue and run it via spark-shell: https://gist.github.com/sdurfey/f9e73cffaeb90cd9c69dcc771fe59f08



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org