You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Krzysztof Sobusiak (JIRA)" <ji...@apache.org> on 2017/09/07 12:31:02 UTC

[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

    [ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156829#comment-16156829 ] 

Krzysztof Sobusiak edited comment on SPARK-20427 at 9/7/17 12:30 PM:
---------------------------------------------------------------------

You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 1E-50).

As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *without* precision and scale is very different from NUMBER/NUMERIC *with* precision and scale.


was (Author: sobusiak):
You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 1E-50).

As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *is not* {{DecimalType}}.

> Issue with Spark interpreting Oracle datatype NUMBER
> ----------------------------------------------------
>
>                 Key: SPARK-20427
>                 URL: https://issues.apache.org/jira/browse/SPARK-20427
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Alexander Andrushenko
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org