You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/03 18:08:34 UTC

[GitHub] [spark] sarutak commented on a change in pull request #31442: [SPARK-34333][SQL] Fix PostgresDialect to handle money types properly

sarutak commented on a change in pull request #31442:
URL: https://github.com/apache/spark/pull/31442#discussion_r569634360



##########
File path: docs/sql-migration-guide.md
##########
@@ -24,6 +24,8 @@ license: |
 
 ## Upgrading from Spark SQL 3.1 to 3.2
 
+  - In Spark 3.2, money type in PostgreSQL table is converted to `StringType` and money[] type is not supported due to the JDBC driver for PostgreSQL can't handle those types properly.

Review comment:
       > Why not a string array for a money array?
   
   For money type, Spark SQL calls `PgResultSet.getDouble` causing the error.
   ```
   [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : 1,000.00
   [info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
   [info] 	at org.postgresql.jdbc.PgResultSet.getDouble(PgResultSet.java:2432)
   [info] 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$5(JdbcUtils.scala:418)
   ```
   So, we can avoid this issue by mapping money type to `StringType` to let Spark SQL call `getString` rather than `getDouble`.
   
   For money[] type, on the other hand, the PostgreSQL's JDBC driver calls `PgResultSet.toDouble` internally.
   ```
   [info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
   [info] 	at org.postgresql.jdbc.ArrayDecoding$5.parseValue(ArrayDecoding.java:235)
   [info] 	at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:122)
   [info] 	at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:764)
   ```
   
   We can control how Spark SQL get the value from the array obtained by `PgResultSet.getArray`, but it's difficult to control how the JDBC driver handles the elements in the array which is to be returned.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org