You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/02 21:40:40 UTC

[GitHub] [spark] Gschiavon opened a new pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Gschiavon opened a new pull request #28984:
URL: https://github.com/apache/spark/pull/28984


   This is a WIP because I want to see what do you think about this.
   
   Now there is a `toJSON` function and I added `toJSON(options)` . I duplicated code now but the idea would be to have `toJSON()` and `toJSON(options)` so we could reuse all the code. The only problem is that adding the `()` to `toJSON` is a breaking change and I want to see what do you think about this.
   
   ### What changes were proposed in this pull request?
   Adding JSON Option for `toJSON` 
   
   
   ### Why are the changes needed?
   In order to be consistent with `to_json` and to be able to print null values when calling `toJSON`
   
   
   ### Does this PR introduce _any_ user-facing change?
   Depends, the idea would be to use the original `toJSON` but the function doesn`t accept parameters, and if I add them, could be a break change.
   
   ### How was this patch tested?
   Just added a test to show the use case
   
   Jira issue -> https://issues.apache.org/jira/browse/SPARK-32158
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #28984: [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #28984:
URL: https://github.com/apache/spark/pull/28984#issuecomment-664822205


   You can wait for other committers to review. Otherwise, I would just close.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28984:
URL: https://github.com/apache/spark/pull/28984#issuecomment-653932081


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28984:
URL: https://github.com/apache/spark/pull/28984#issuecomment-653931808


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449872153



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       The built-in `to_json` is not enough for your usecase? cc: @HyukjinKwon 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r450016104



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       Thanks for the check, @HyukjinKwon. I'm neutral on this, too. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Gschiavon commented on pull request #28984: [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
Gschiavon commented on pull request #28984:
URL: https://github.com/apache/spark/pull/28984#issuecomment-664821178


   what should we do @HyukjinKwon ? I think it is simplier this way but I don't have a strong opinion either, as there are workarounds. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449972804



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       I would deduplicate it by calling `toJSON(options: Map[String, String])` at `toJSON`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Gschiavon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
Gschiavon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r450003162



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       ok!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Gschiavon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
Gschiavon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449876316



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       I guess so, you'd have to do something like this `df.select(to_json(struct("*"), Map("ignoreNullFields" -> "false")).as("value"))` to produce the same result as `df.toJSON(options)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28984:
URL: https://github.com/apache/spark/pull/28984#issuecomment-653931808


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449872473



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1

Review comment:
       note: we cannot add a new feature in v3.0.x, so not `3.0.1` but `3.1.0`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449972627



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       I don't feel strongly here. I would rather don't extend `toJSON` anymore given that we added `to_json`, `from_json`, etc. to cover such general cases. I am okay if any committer supports it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Gschiavon closed pull request #28984: [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
Gschiavon closed pull request #28984:
URL: https://github.com/apache/spark/pull/28984


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449972853



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3441,6 +3474,7 @@ class Dataset[T] private[sql](
     files.toSet.toArray
   }
 
+

Review comment:
       Let's remove unrelated newline changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Gschiavon commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
Gschiavon commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449876245



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1

Review comment:
       ok!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #28984: [WIP] [SPARK-32158] [SQL] Add options to toJSON function

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #28984:
URL: https://github.com/apache/spark/pull/28984#discussion_r449879366



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3419,6 +3419,39 @@ class Dataset[T] private[sql](
     } (Encoders.STRING)
   }
 
+  /**
+   * Returns the content of the Dataset as a Dataset of JSON strings.
+   * @since 3.0.1
+   */
+  def toJSON(options: Map[String, String]): Dataset[String] = {

Review comment:
       Yea, I think so.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org