You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "dbatomic (via GitHub)" <gi...@apache.org> on 2024/03/14 10:14:49 UTC

[PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

dbatomic opened a new pull request, #45515:
URL: https://github.com/apache/spark/pull/45515

### What changes were proposed in this pull request?

This PR adds new golden file tests for collation feature:
1) DESCRIBE
2) SET operations
3) Basic array operations
4) Removing struct test since same is already covered in golden files.

Change also contains minor fix for literal with default value creation which was exposed with SET operation test.

### Why are the changes needed?

Extending test coverage for collation feature.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

No

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "dbatomic (via GitHub)" <gi...@apache.org>.

dbatomic commented on PR #45515:
URL: https://github.com/apache/spark/pull/45515#issuecomment-2000040163

   > @MaxGekk and @HyukjinKwon - Folks, let me do the following:
   > 
   > 1. Remove set tests out of this PR. This will also remove the prod code change in `literals.scala`.
   > 2. Create new PR for that change with JIRA ticket.
   > 3. Use this PR for only GOLDEN FILEs. I guess that for that I don't need JIRA? Asking this just to clarify the rules, of course it is not a problem to create JIRA ticket for this if needed.
   
   Created following PR for set operations:
   https://github.com/apache/spark/pull/45536


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "dbatomic (via GitHub)" <gi...@apache.org>.

dbatomic commented on code in PR #45515:
URL: https://github.com/apache/spark/pull/45515#discussion_r1526502213


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -195,7 +195,7 @@ object Literal {
     case TimestampNTZType => create(0L, TimestampNTZType)
     case it: DayTimeIntervalType => create(0L, it)
     case it: YearMonthIntervalType => create(0, it)
-    case StringType => Literal("")
+    case st: StringType => Literal(UTF8String.fromString(""), st)

Review Comment:
   That's right. set intersection tests found an interesting issue that we fail to create a default value in case of StringType with collation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "dbatomic (via GitHub)" <gi...@apache.org>.

dbatomic commented on PR #45515:
URL: https://github.com/apache/spark/pull/45515#issuecomment-1999974958

   @MaxGekk and @HyukjinKwon - 
   Folks, let me do the following:
   1) Remove set tests out of this PR. This will also remove the prod code change in `literals.scala`.
   2) Create new PR for that change with JIRA ticket.
   3) Use this PR for only GOLDEN FILEs. I guess that for that I don't need JIRA? Asking this just to clarify the rules, of course it is not a problem to create JIRA ticket for this if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk closed pull request #45515: [MINOR][TESTS] Collation - extending golden file coverage
URL: https://github.com/apache/spark/pull/45515


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #45515:
URL: https://github.com/apache/spark/pull/45515#discussion_r1526138543


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -195,7 +195,7 @@ object Literal {
     case TimestampNTZType => create(0L, TimestampNTZType)
     case it: DayTimeIntervalType => create(0L, it)
     case it: YearMonthIntervalType => create(0, it)
-    case StringType => Literal("")
+    case st: StringType => Literal(UTF8String.fromString(""), st)

Review Comment:
   The new and old one are not equal as I can see. The deleted one creates a literal w/ the default collation, but new one uses the specified from the argument.



##########
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala:
##########
@@ -456,28 +456,6 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper {
     }
   }
 
-  test("create table with collations inside a struct") {
-    val tableName = "struct_collation_tbl"
-    val collationName = "UTF8_BINARY_LCASE"
-    val collationId = CollationFactory.collationNameToId(collationName)
-
-    withTable(tableName) {
-      sql(
-        s"""
-           |CREATE TABLE $tableName
-           |(c1 STRUCT<name: STRING COLLATE $collationName, age: INT>)
-           |USING PARQUET
-           |""".stripMargin)
-
-      sql(s"INSERT INTO $tableName VALUES (named_struct('name', 'aaa', 'id', 1))")
-      sql(s"INSERT INTO $tableName VALUES (named_struct('name', 'AAA', 'id', 2))")
-
-      checkAnswer(sql(s"SELECT DISTINCT collation(c1.name) FROM $tableName"),

Review Comment:
   Did you move this test to somewhere? Could you point the place out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #45515:
URL: https://github.com/apache/spark/pull/45515#discussion_r1525631976


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##########
@@ -195,7 +195,7 @@ object Literal {
     case TimestampNTZType => create(0L, TimestampNTZType)
     case it: DayTimeIntervalType => create(0L, it)
     case it: YearMonthIntervalType => create(0, it)
-    case StringType => Literal("")
+    case st: StringType => Literal(UTF8String.fromString(""), st)

Review Comment:
   Why do we need this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #45515:
URL: https://github.com/apache/spark/pull/45515#issuecomment-1998701448

   This isn;t really a minor. Let's file a new JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "dbatomic (via GitHub)" <gi...@apache.org>.

dbatomic commented on code in PR #45515:
URL: https://github.com/apache/spark/pull/45515#discussion_r1526521028


##########
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala:
##########
@@ -456,28 +456,6 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanHelper {
     }
   }
 
-  test("create table with collations inside a struct") {
-    val tableName = "struct_collation_tbl"
-    val collationName = "UTF8_BINARY_LCASE"
-    val collationId = CollationFactory.collationNameToId(collationName)
-
-    withTable(tableName) {
-      sql(
-        s"""
-           |CREATE TABLE $tableName
-           |(c1 STRUCT<name: STRING COLLATE $collationName, age: INT>)
-           |USING PARQUET
-           |""".stripMargin)
-
-      sql(s"INSERT INTO $tableName VALUES (named_struct('name', 'aaa', 'id', 1))")
-      sql(s"INSERT INTO $tableName VALUES (named_struct('name', 'AAA', 'id', 2))")
-
-      checkAnswer(sql(s"SELECT DISTINCT collation(c1.name) FROM $tableName"),

Review Comment:
   The same test already exists in golden files. I mentioned this in PR description:
   
   _Removing struct test since same is already covered in golden files._
   
   I guess that it doesn't make to keep the same test in two places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [MINOR][TESTS] Collation - extending golden file coverage [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #45515:
URL: https://github.com/apache/spark/pull/45515#issuecomment-2003227067

   +1, LGTM. Merging to master.
   Thank you, @dbatomic and @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org