You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2024/03/25 07:57:34 UTC

[PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` [spark]

panbingkun opened a new pull request, #45692:
URL: https://github.com/apache/spark/pull/45692

   ### What changes were proposed in this pull request?
   The pr aims to add a UT for `nested structure` for the function `to_csv`.
   FollowUp: https://github.com/apache/spark/pull/45657
   
   
   ### Why are the changes needed?
   Add a UT, improve test coverage.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   - Manually test.
   - Pass GA.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #45692:
URL: https://github.com/apache/spark/pull/45692#issuecomment-2019197760

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #45692:
URL: https://github.com/apache/spark/pull/45692#discussion_r1538390049


##########
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala:
##########
@@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {
       context = ExpectedContext(fragment = "to_csv", getCurrentClassCallSitePattern)
     )
   }
+
+  test("SPARK-47497: to_csv support the data of nested structure as pretty strings") {
+    // The item of the Array is a Map
+    val rows = new java.util.ArrayList[Row]()
+    rows.add(Row(1L, Row(2L, "Alice",
+      Array(Map("math" -> 100L, "english" -> 200L, "science" -> null),
+        Map("math" -> 300L, "english" -> 400L, "science" -> 500L)))))
+
+    val valueSchema = StructType(Seq(
+      StructField("age", LongType),
+      StructField("name", StringType),
+      StructField("scores", ArrayType(MapType(StringType, LongType)))))
+    val schema = StructType(Seq(
+      StructField("key", LongType),
+      StructField("value", valueSchema)))
+
+    val df = spark.createDataFrame(rows, schema)
+    val actual1 = df.select(to_csv($"value"))
+    checkAnswer(actual1, Row("2,Alice," +
+      "\"[{math -> 100, english -> 200, science ->}, " +

Review Comment:
   What about Spark Connect with Python?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan closed pull request #45692: [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv`
URL: https://github.com/apache/spark/pull/45692


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #45692:
URL: https://github.com/apache/spark/pull/45692#discussion_r1537396948


##########
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala:
##########
@@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {
       context = ExpectedContext(fragment = "to_csv", getCurrentClassCallSitePattern)
     )
   }
+
+  test("SPARK-47497: to_csv support the data of nested structure as pretty strings") {
+    // The item of the Array is a Map
+    val rows = new java.util.ArrayList[Row]()
+    rows.add(Row(1L, Row(2L, "Alice",
+      Array(Map("math" -> 100L, "english" -> 200L, "science" -> null),
+        Map("math" -> 300L, "english" -> 400L, "science" -> 500L)))))
+
+    val valueSchema = StructType(Seq(
+      StructField("age", LongType),
+      StructField("name", StringType),
+      StructField("scores", ArrayType(MapType(StringType, LongType)))))
+    val schema = StructType(Seq(
+      StructField("key", LongType),
+      StructField("value", valueSchema)))
+
+    val df = spark.createDataFrame(rows, schema)
+    val actual1 = df.select(to_csv($"value"))
+    checkAnswer(actual1, Row("2,Alice," +
+      "\"[{math -> 100, english -> 200, science ->}, " +

Review Comment:
   Yes, this be supported in Spark Connect with Scala client, double-check as follows:
   <img width="829" alt="image" src="https://github.com/apache/spark/assets/15246973/f446090d-48c0-487a-a960-90b2d720b3b5">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #45692:
URL: https://github.com/apache/spark/pull/45692#discussion_r1538390440


##########
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala:
##########
@@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {
       context = ExpectedContext(fragment = "to_csv", getCurrentClassCallSitePattern)
     )
   }
+
+  test("SPARK-47497: to_csv support the data of nested structure as pretty strings") {
+    // The item of the Array is a Map
+    val rows = new java.util.ArrayList[Row]()
+    rows.add(Row(1L, Row(2L, "Alice",
+      Array(Map("math" -> 100L, "english" -> 200L, "science" -> null),
+        Map("math" -> 300L, "english" -> 400L, "science" -> 500L)))))
+
+    val valueSchema = StructType(Seq(
+      StructField("age", LongType),
+      StructField("name", StringType),
+      StructField("scores", ArrayType(MapType(StringType, LongType)))))
+    val schema = StructType(Seq(
+      StructField("key", LongType),
+      StructField("value", valueSchema)))
+
+    val df = spark.createDataFrame(rows, schema)
+    val actual1 = df.select(to_csv($"value"))
+    checkAnswer(actual1, Row("2,Alice," +
+      "\"[{math -> 100, english -> 200, science ->}, " +

Review Comment:
   Ah, okay it's fixed in https://github.com/apache/spark/pull/45657



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #45692:
URL: https://github.com/apache/spark/pull/45692#issuecomment-2017428050

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #45692:
URL: https://github.com/apache/spark/pull/45692#discussion_r1537318999


##########
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala:
##########
@@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {
       context = ExpectedContext(fragment = "to_csv", getCurrentClassCallSitePattern)
     )
   }
+
+  test("SPARK-47497: to_csv support the data of nested structure as pretty strings") {
+    // The item of the Array is a Map
+    val rows = new java.util.ArrayList[Row]()
+    rows.add(Row(1L, Row(2L, "Alice",
+      Array(Map("math" -> 100L, "english" -> 200L, "science" -> null),
+        Map("math" -> 300L, "english" -> 400L, "science" -> 500L)))))
+
+    val valueSchema = StructType(Seq(
+      StructField("age", LongType),
+      StructField("name", StringType),
+      StructField("scores", ArrayType(MapType(StringType, LongType)))))
+    val schema = StructType(Seq(
+      StructField("key", LongType),
+      StructField("value", valueSchema)))
+
+    val df = spark.createDataFrame(rows, schema)
+    val actual1 = df.select(to_csv($"value"))
+    checkAnswer(actual1, Row("2,Alice," +
+      "\"[{math -> 100, english -> 200, science ->}, " +

Review Comment:
   Are we really going to settle this down as a formal behaviour? Is this upported in Spark Connect with Scala client?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org