You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/21 10:12:11 UTC

[GitHub] [spark] cloud-fan edited a comment on issue #27150: [SPARK-30471][SQL] Fix issue when comparing String and IntegerType

cloud-fan edited a comment on issue #27150: [SPARK-30471][SQL] Fix issue when comparing String and IntegerType
URL: https://github.com/apache/spark/pull/27150#issuecomment-589560363
 
 
   Since this has missed 3.0 already, I'm thinking about if we should do a more thorough change to fix this problem.
   
   It's a terrible choice to compare string with numeric, as the string content is unknown and we need to think about many corner cases.
   
   **case 1: compare string and integer:**
   The string can be a very large number beyond Long.Max, or can be a fraction number.
   I think it's better to ansi_cast both sides to long, and fail if the string content exceeds Long.Max or is a fraction. This is also the behavior of pgsql
   ```
   cloud0fan=# select '2' > 1;
    ?column? 
   ----------
    t
   (1 row)
   
   cloud0fan=# select '2.2' > 1;
   ERROR:  invalid input syntax for integer: "2.2"
   LINE 1: select '2.2' > 1;
   ```
   
   **case 2: compare string and float/double:**
   Similarly, ansi_cast both sides to double, as it's the widest type.
   
   **case 3: compare string and decimal:**
   decimal is a precise number and precision loss is not acceptable. I think we should ansi_cast both sides to `decimal(max_precision, original_scale + 1)`.
   
   More importantly, we should only allow the comparison for literal strings like many other SQL systems. The string content is unknown and it's very likely to fail, so only allow string literal can fail earlier at compile time.
   
   also cc @maropu @viirya 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org