You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "comphead (via GitHub)" <gi...@apache.org> on 2023/07/28 00:35:02 UTC

[GitHub] [arrow-datafusion] comphead opened a new issue, #7121: [BUG] CAST Decimal gives incorrect result

comphead opened a new issue, #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121

   ### Describe the bug
   
   CAST Decimal gives incorrect result
   
   ### To Reproduce
   
   Spark/PG
   ```
   select cast(6.4053151420411946063694043751862251568 as decimal(38,37))
   
   6.4053151420411946063694043751862251568
   ```
   DF
   ```
   ❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37));
   +-----------------------------------------+
   | Float64(6.405315142041195)              |
   +-----------------------------------------+
   | 6.4053151420411954619510965656866521088 |
   +-----------------------------------------+
   ```
   
   ### Expected behavior
   
   Should cast correctly
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1655997087

   Thanks @Dandandan 
   
   ```
   ❯ set datafusion.sql_parser.parse_float_as_decimal=true;
   
   ❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) + 1;
   +---------------------------------------------------------------------------+
   | Decimal128(Some(64053151420411946063694043751862251568),38,37) + Int64(1) |
   +---------------------------------------------------------------------------+
   | 7.4053151420411946063694043751862251568                                   |
   +---------------------------------------------------------------------------+
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1655990315

   There is a config in DF whether to parse floating numbers as decimal by default, it was disabled earlier because of issues, but maybe by now we should enable it by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead closed issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead closed issue #7121: [BUG] CAST Decimal gives incorrect result
URL: https://github.com/apache/arrow-datafusion/issues/7121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1654803708

   I'll check that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1654989138

   I would still treat this as a bug, my vision the problem is DF cannot identify correctly decimal as a input and does implicit conversion to the f64 which is not expected.
   
   Both Postgres and Spark returns correct result for 
   ```
   select 6.4053151420411946063694043751862251568 + 1;
   
   7.4053151420411946063694043751862251568
   ```
   
   whereas DF likely does unneccessary cast and thus loss of precision
   
   ```
   
   ❯ select 6.4053151420411946063694043751862251568 + 1;
   +---------------------------------------+
   | Float64(6.405315142041195) + Int64(1) |
   +---------------------------------------+
   | 7.405315142041195                     |
   +---------------------------------------+
   1 row in set. Query took 0.057 seconds.
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] viirya commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1654866409

   I'm not sure if this is a bug.
   
   Actually, even you do something like:
   
   ```
   ❯ select 6.4053151420411946063694043751862251568;
   +----------------------------+
   | Float64(6.405315142041195) |
   +----------------------------+
   | 6.405315142041195          |
   +----------------------------+
   1 row in set. Query took 0.005 seconds.
   ```
   
   You can see that you still cannot get full precision of the input floating point.
   
   It is more like the precision limit instead of an incorrect result.
   
   Further, if you look at Spark, `6.4053151420411946063694043751862251568` is treated as decimal, instead of float64
   
   ```
   scala> sql("select 6.4053151420411946063694043751862251568").printSchema
   root
    |-- 6.4053151420411946063694043751862251568: decimal(38,37) (nullable = false)
   ```
   
   So actually in Spark you just cast decimal to decimal?
   
   But in DataFusion, it is float64:
   
   ```
   ❯ explain select 6.4053151420411946063694043751862251568;
   +---------------+------------------------------------------------------------------------+
   | plan_type     | plan                                                                   |
   +---------------+------------------------------------------------------------------------+
   | logical_plan  | Projection: Float64(6.405315142041195)                                 |
   |               |   EmptyRelation                                                        |
   | physical_plan | ProjectionExec: expr=[6.405315142041195 as Float64(6.405315142041195)] |
   |               |   EmptyExec: produce_one_row=true                                      |
   |               |                                                                        |
   +---------------+------------------------------------------------------------------------+
   2 rows in set. Query took 0.006 seconds.
   
   ```
   
   It explains why casted result is the exactly same as input because there will be precision loss when casting floating-point value to decimal.
   
   Then, let's force Spark to treat input value as float64?
   
   ```
   scala> sql("select cast(6.4053151420411946063694043751862251568d as decimal(38,37))").collect()
   res13: Array[org.apache.spark.sql.Row] = Array([6.4053151420411950000000000000000000000])
   ```
   
   You can see that now Spark also returns a precision loss decimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] viirya commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1655043242

   Then it is a question about what type of value should DataFusion parse from a literal like `6.4053151420411946063694043751862251568`, it should be currently float64, or decimal like Spark?
   
   This could be discussed.
   
   I'm not sure if it is designed by intention or just a mistake (bug?).
   
   Just to clarify that this is not a bug that DataFusion cast produces an incorrect result. Actually the cast result is correct as it treats the input as float64 instead of decimal. Otherwise, it might be confused for others who look for the root cause in cast function but actually it is not there.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1655981570

   another test case showing the cast is likely ok, at least the query gives expected value casting from string literal
   ```
   ❯ select cast('6.4053151420411946063694043751862251568' as decimal(38,37)) + 1;
   +------------------------------------------------------------+
   | Utf8("6.4053151420411946063694043751862251568") + Int64(1) |
   +------------------------------------------------------------+
   | 7.4053151420411946063694043751862251568                    |
   +------------------------------------------------------------+
   ❯ select arrow_typeof(cast('6.4053151420411946063694043751862251568' as decimal(38,37)) + 1);
   +--------------------------------------------------------------------------+
   | arrow_typeof(Utf8("6.4053151420411946063694043751862251568") + Int64(1)) |
   +--------------------------------------------------------------------------+
   | Decimal128(38, 37)                                                       |
   +--------------------------------------------------------------------------+
   ```
   
   whereas numeric literal is casted forcibly to float first , which not expected as for me
   
   
   ```
   ❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) + 1;
   +-----------------------------------------+
   | Float64(6.405315142041195) + Int64(1)   |
   +-----------------------------------------+
   | 7.4053151420411954619510965656866521088 |
   +-----------------------------------------+
   1 row in set. Query took 0.002 seconds.
   ❯ select arrow_typeof(cast(6.4053151420411946063694043751862251568 as decimal(38,37)) + 1);
   +-----------------------------------------------------+
   | arrow_typeof(Float64(6.405315142041195) + Int64(1)) |
   +-----------------------------------------------------+
   | Decimal128(38, 37)                                  |
   +-----------------------------------------------------+
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #7121: [BUG] CAST Decimal gives incorrect result

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #7121:
URL: https://github.com/apache/arrow-datafusion/issues/7121#issuecomment-1655991899

   https://github.com/apache/arrow-datafusion/blob/55930fbffe34611e42676c52a31f7e90ccd77ca5/datafusion/sql/src/planner.rs#L62


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org