You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "mitkedb (via GitHub)" <gi...@apache.org> on 2023/10/18 07:57:37 UTC

[PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

mitkedb opened a new pull request, #43430:
URL: https://github.com/apache/spark/pull/43430

   ### What changes were proposed in this pull request?
   
   Adds the `::` syntax as syntactic sugar for casting columns. This is a pretty common syntax, and it was accepted by the SQL API in the [Semi-Structured Data API PRD](https://docs.google.com/document/d/1yNf0oE7XNZpLvsWly-MxZaxdlvMdRlZ1ZjSndtmoiWs/edit#heading=h.k50kjbi5yepj).
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, new casting syntax.
   
   ### How was this patch tested?
   
   Unit tests.
   SQL tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1774008080

   @mitkedb Congratulations with your first contribution to Apache Spark!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "mitkedb (via GitHub)" <gi...@apache.org>.
mitkedb commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1365663045


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   Addressed all the feedback
    - reduced the number of tests
    - consolidated tests in cast.sql
    - added tests suggested by Max



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1774007559

   +1, LGTM. Merging to master.
   Thank you, @mitkedb and @beliefer @cloud-fan @peter-toth @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1367682326


##########
sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala:
##########
@@ -630,7 +630,7 @@ class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession with SQL
       exception = parseException("SELECT CAST(struct(1,2,3) AS STRUCT<INT>)"),
       errorClass = "PARSE_SYNTAX_ERROR",
       sqlState = "42601",
-      parameters = Map("error" -> "'>'", "hint" -> ""))
+      parameters = Map("error" -> "'<'", "hint" -> ": missing ')'"))

Review Comment:
   Could you debug the `DefaultErrorStrategy` to see what is the root cause?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "mitkedb (via GitHub)" <gi...@apache.org>.
mitkedb commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1366548917


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/CastingSyntaxSuite.scala:
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.parser;
+
+import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, UnresolvedAttribute, UnresolvedFunction, UnresolvedStar}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Cast, Expression, Literal}
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException, ParserInterface}
+import org.apache.spark.sql.types.{DoubleType, IntegerType}
+
+class CastingSyntaxSuite extends AnalysisTest {
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  val defaultParser = CatalystSqlParser
+
+  def assertEqual(
+      sqlCommand: String,
+      e: Expression,
+      parser: ParserInterface = defaultParser): Unit = {
+    compareExpressions(parser.parseExpression(sqlCommand), e)
+  }
+
+  def assertFails(sql: String, errorMsg: String): Unit = {
+    val e = intercept[ParseException](defaultParser.parseExpression(sql))
+    assert(e.getMessage.contains(errorMsg))
+  }
+
+  test("literals") {
+    assertEqual("123::double", Cast(Literal(123), DoubleType))
+    assertEqual("'123'::double", Cast(Literal("123"), DoubleType))
+    assertEqual("'123'::int", Cast(Literal("123"), IntegerType))
+    assertEqual("'123.0'::double", Cast(Literal("123.0"), DoubleType))
+    assertEqual("'123.0' :: double", Cast(Literal("123.0"), DoubleType))
+    assertEqual("`123`::double", Cast(UnresolvedAttribute(Seq("123")), DoubleType))
+
+    assertEqual("`123::double`", UnresolvedAttribute(Seq("123::double")))
+  }
+
+  test("named expressions") {
+    assertEqual("123::double as v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("123::double v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("123 :: double v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("abc::double v", Alias(Cast(UnresolvedAttribute("abc"), DoubleType), "v")())
+    assertEqual("`abc`::double v", Alias(Cast(UnresolvedAttribute("abc"), DoubleType), "v")())
+    assertEqual("abc.def::double v",
+      Alias(Cast(UnresolvedAttribute(Seq("abc", "def")), DoubleType), "v")())
+    assertEqual("`abc.def`::double v",
+      Alias(Cast(UnresolvedAttribute(Seq("abc.def")), DoubleType), "v")())
+  }
+
+  test("boolean expressions") {
+    assertEqual("(a and b) :: int", Cast('a && 'b, IntegerType))
+    assertEqual("(a or b) :: int", Cast('a || 'b, IntegerType))
+  }
+
+  test("arithmetic expressions") {
+    assertEqual("(a - b) :: int", Cast('a - 'b, IntegerType))
+    assertEqual("(a * b) :: int", Cast('a * 'b, IntegerType))

Review Comment:
   added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364807554


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -957,6 +957,7 @@ primaryExpression
     | CASE whenClause+ (ELSE elseExpression=expression)? END                                   #searchedCase
     | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END                  #simpleCase
     | name=(CAST | TRY_CAST) LEFT_PAREN expression AS dataType RIGHT_PAREN                     #cast
+    | primaryExpression DOUBLE_COLON dataType                                                  #castByColon

Review Comment:
   +1. CAST is also using `expression`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1365800071


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -2128,6 +2128,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging {
     }
   }
 
+  /**
+   * Create a [[Cast]] expression.
+   */
+  override def visitCastByColon(ctx: CastByColonContext): Expression = withOrigin(ctx) {
+    val rawDataType = typedVisit[DataType](ctx.dataType())
+    val dataType = CharVarcharUtils.replaceCharVarcharWithStringForCast(rawDataType)
+    Cast(expression(ctx.primaryExpression), dataType)

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1363507142


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -957,6 +957,7 @@ primaryExpression
     | CASE whenClause+ (ELSE elseExpression=expression)? END                                   #searchedCase
     | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END                  #simpleCase
     | name=(CAST | TRY_CAST) LEFT_PAREN expression AS dataType RIGHT_PAREN                     #cast
+    | primaryExpression DOUBLE_COLON dataType                                                  #castByColon

Review Comment:
   Could you clarify, please, why do you allow only `primaryExpression` but not `expression` like at the line above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364963550


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   I agree, it is better to test more corner cases:
   1. Chain of casts: `<expr> :: type1 :: type2`
   2. Regular cast + new one: `CAST(<expr> :: type1 AS type2)`
   3. Precedence of operators: `map[1] :: type1 [2]`
   4. If you support only primary expr, add a negative test for non-primary expr
   5. Incorrect taget type: `<expr> :: BINT`, `<expr> :: SELECT`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "mitkedb (via GitHub)" <gi...@apache.org>.
mitkedb commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1365912215


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -2128,6 +2128,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging {
     }
   }
 
+  /**
+   * Create a [[Cast]] expression.

Review Comment:
   fixed.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -2128,6 +2128,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging {
     }
   }
 
+  /**
+   * Create a [[Cast]] expression.
+   */
+  override def visitCastByColon(ctx: CastByColonContext): Expression = withOrigin(ctx) {
+    val rawDataType = typedVisit[DataType](ctx.dataType())
+    val dataType = CharVarcharUtils.replaceCharVarcharWithStringForCast(rawDataType)
+    Cast(expression(ctx.primaryExpression), dataType)

Review Comment:
   fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #43430: [SPARK-45574][SQL] Add :: syntax as a shorthand for casting
URL: https://github.com/apache/spark/pull/43430


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364963550


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   I agree, it is better to test more corner cases instead of regular values:
   1. Chain of casts: `<expr> :: type1 :: type2`
   2. Regular cast + new one: `CAST(<expr> :: type1 AS type2)`
   3. Precedence of operators: `map[1] :: type1 [2]`
   4. If you support only primary expr, add a negative test for non-primary expr
   5. Incorrect taget type: `<expr> :: BINT`, `<expr> :: SELECT`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "mitkedb (via GitHub)" <gi...@apache.org>.
mitkedb commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1773068183

   Debugging the remaining two unittest failures...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1365800932


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/CastingSyntaxSuite.scala:
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.parser;
+
+import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, UnresolvedAttribute, UnresolvedFunction, UnresolvedStar}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Cast, Expression, Literal}
+import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException, ParserInterface}
+import org.apache.spark.sql.types.{DoubleType, IntegerType}
+
+class CastingSyntaxSuite extends AnalysisTest {
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  val defaultParser = CatalystSqlParser
+
+  def assertEqual(
+      sqlCommand: String,
+      e: Expression,
+      parser: ParserInterface = defaultParser): Unit = {
+    compareExpressions(parser.parseExpression(sqlCommand), e)
+  }
+
+  def assertFails(sql: String, errorMsg: String): Unit = {
+    val e = intercept[ParseException](defaultParser.parseExpression(sql))
+    assert(e.getMessage.contains(errorMsg))
+  }
+
+  test("literals") {
+    assertEqual("123::double", Cast(Literal(123), DoubleType))
+    assertEqual("'123'::double", Cast(Literal("123"), DoubleType))
+    assertEqual("'123'::int", Cast(Literal("123"), IntegerType))
+    assertEqual("'123.0'::double", Cast(Literal("123.0"), DoubleType))
+    assertEqual("'123.0' :: double", Cast(Literal("123.0"), DoubleType))
+    assertEqual("`123`::double", Cast(UnresolvedAttribute(Seq("123")), DoubleType))
+
+    assertEqual("`123::double`", UnresolvedAttribute(Seq("123::double")))
+  }
+
+  test("named expressions") {
+    assertEqual("123::double as v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("123::double v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("123 :: double v", Alias(Cast(Literal(123), DoubleType), "v")())
+    assertEqual("abc::double v", Alias(Cast(UnresolvedAttribute("abc"), DoubleType), "v")())
+    assertEqual("`abc`::double v", Alias(Cast(UnresolvedAttribute("abc"), DoubleType), "v")())
+    assertEqual("abc.def::double v",
+      Alias(Cast(UnresolvedAttribute(Seq("abc", "def")), DoubleType), "v")())
+    assertEqual("`abc.def`::double v",
+      Alias(Cast(UnresolvedAttribute(Seq("abc.def")), DoubleType), "v")())
+  }
+
+  test("boolean expressions") {
+    assertEqual("(a and b) :: int", Cast('a && 'b, IntegerType))
+    assertEqual("(a or b) :: int", Cast('a || 'b, IntegerType))
+  }
+
+  test("arithmetic expressions") {
+    assertEqual("(a - b) :: int", Cast('a - 'b, IntegerType))
+    assertEqual("(a * b) :: int", Cast('a * 'b, IntegerType))

Review Comment:
   can we also test `a + b::int` and make sure it produces `Add(a, Cast(b, int))`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1772296311

   This test failure seems related to your changes:
   ```
   [info] - using _FUNC_ instead of function names in examples *** FAILED *** (7 milliseconds)
   [info]   Expression class 'org.apache.spark.sql.catalyst.expressions.Cast' collection.this.IterableOnce.iterableOnceExtensionMethods[String](exampleRe.findAllIn(exprExamples)).toIterable.filter(((x$1: String) => setStmtRe.findFirstIn(x$1).isEmpty)).forall(((x$2: String) => x$2.contains("_FUNC_"))) was false (ExpressionInfoSuite.scala:125)
   [info]   org.scalatest.exceptions.TestFailedException:
   ```
   @mitkedb Please, take a look at it and fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1365345766


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -2128,6 +2128,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging {
     }
   }
 
+  /**
+   * Create a [[Cast]] expression.

Review Comment:
   `Create a [[Cast]] expression with '::' syntax.`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -438,11 +438,14 @@ object Cast extends QueryErrorsBase {
  * session local timezone by an analyzer [[ResolveTimeZone]].
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`.",
+  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`." +
+          " `expr` :: `type` alternative casting syntax is also supported.",

Review Comment:
   Shall we add a blank line to separate the two lines? so that the document more clear.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -2128,6 +2128,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging {
     }
   }
 
+  /**
+   * Create a [[Cast]] expression.
+   */
+  override def visitCastByColon(ctx: CastByColonContext): Expression = withOrigin(ctx) {
+    val rawDataType = typedVisit[DataType](ctx.dataType())
+    val dataType = CharVarcharUtils.replaceCharVarcharWithStringForCast(rawDataType)
+    Cast(expression(ctx.primaryExpression), dataType)

Review Comment:
   Shall we set the tag value for `USER_SPECIFIED_CAST`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1774007356

   > If this is the case, the fix is to just update the expected output to match what we're getting now.
   
   I agree, let's fix the expected error params.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364904066


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   maybe just add a few end-to-end tests in `cast.sql`? The underlying expression is the same so we probably don't need to test so many different values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364902504


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   I don't think we need so many tests tho.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1363507142


##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -957,6 +957,7 @@ primaryExpression
     | CASE whenClause+ (ELSE elseExpression=expression)? END                                   #searchedCase
     | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END                  #simpleCase
     | name=(CAST | TRY_CAST) LEFT_PAREN expression AS dataType RIGHT_PAREN                     #cast
+    | primaryExpression DOUBLE_COLON dataType                                                  #castByColon

Review Comment:
   Could you clarify, please, don't you allow only `primaryExpression` but not `expression` like at the line above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1364902987


##########
sql/core/src/test/resources/sql-tests/inputs/cast_op.sql:
##########
@@ -0,0 +1,148 @@
+-- cast string representing a valid fractional number to integral should truncate the number
+SELECT '1.23' :: int;
+SELECT '1.23' :: long;
+SELECT '-4.56' :: int;
+SELECT '-4.56' :: long;
+
+-- cast string which are not numbers to numeric types
+SELECT 'abc' :: int;
+SELECT 'abc' :: long;
+SELECT 'abc' :: float;
+SELECT 'abc' :: double;
+
+-- cast string representing a very large number to integral should return null
+SELECT '1234567890123' :: int;
+SELECT '12345678901234567890123' :: long;
+
+-- cast empty string to integral should return null
+SELECT '' :: int;
+SELECT '' :: long;
+SELECT '' :: float;
+SELECT '' :: double;
+
+-- cast null to integral should return null
+SELECT NULL :: int;
+SELECT NULL :: long;
+
+-- cast invalid decimal string to numeric types
+SELECT '123.a' :: int;
+SELECT '123.a' :: long;
+SELECT '123.a' :: float;
+SELECT '123.a' :: double;
+
+-- '-2147483648' is the smallest int value

Review Comment:
   Can we just have few tests? A lot of tests are duplicates with `CastingSyntaxSuite` too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "mitkedb (via GitHub)" <gi...@apache.org>.
mitkedb commented on PR #43430:
URL: https://github.com/apache/spark/pull/43430#issuecomment-1773091094

   I can repro the failures with the diff, and they do not repro without the diff.
   
   The following two unittest asserts are failing in QueryParsingErrorsSuite:
   
       checkError(
         exception = parseException("SELECT CAST(map('1',2) AS MAP<STRING>)"),
         errorClass = "PARSE_SYNTAX_ERROR",
         sqlState = "42601",
         parameters = Map("error" -> "'>'", "hint" -> ""))
         
         
         and 
         
             checkError(
         exception = parseException("SELECT CAST(struct(1,2,3) AS STRUCT<INT>)"),
         errorClass = "PARSE_SYNTAX_ERROR",
         sqlState = "42601",
         parameters = Map("error" -> "'>'", "hint" -> ""))
         
         
   My hypothesis is that the SparkParserErrorStrategy result changed with the updates we made to the parser, and triggered a slightly different recovery error message. If this is the case, the fix is to just update the expected output to match what we're getting now. Unfortunately, this check is fairly fragile, but it is unclear to me how to fix it (and even if we want to fix, we should do it separately) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45574][SQL] Add :: syntax as a shorthand for casting [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43430:
URL: https://github.com/apache/spark/pull/43430#discussion_r1366643127


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
##########
@@ -438,11 +438,14 @@ object Cast extends QueryErrorsBase {
  * session local timezone by an analyzer [[ResolveTimeZone]].
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`.",
+  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`." +
+          " `expr` :: `type` alternative casting syntax is also supported.",
   examples = """
     Examples:
       > SELECT _FUNC_('10' as int);
        10
+      > SELECT '10' :: int;

Review Comment:
   We usually use `_FUNC_`, BTW, but this case is special.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org