You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rafael (Jira)" <ji...@apache.org> on 2020/05/14 14:49:00 UTC

[jira] [Comment Edited] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark

    [ https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107365#comment-17107365 ] 

Rafael edited comment on SPARK-30100 at 5/14/20, 2:48 PM:
----------------------------------------------------------

Hey guys, 
 I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?


was (Author: kyrdan):
Hey guys, 
I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869]

> Decimal Precision Inferred from JDBC via Spark
> ----------------------------------------------
>
>                 Key: SPARK-30100
>                 URL: https://issues.apache.org/jira/browse/SPARK-30100
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.4.0
>            Reporter: Joby Joje
>            Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supports *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(38,10). This is happening to few fields in the dataframe where the column is fetched using a CASE statement whereas in the same query another field populates the right schema.
> Tried to pass the
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \ 
> .format("jdbc") \ 
> .option("url", "ORACLE") \ 
> .option("dbtable", "QUERY") \ 
> .option("user", "USERNAME") \ 
> .option("password", "PASSWORD") \ 
> .load(){code}
> So considering that the Spark infers the schema from a sample records, how does this work here? Does it use the results of the query i.e (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org