You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/03/07 16:28:11 UTC

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/17192

    [SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array

    ## What changes were proposed in this pull request?
    
    This PR proposes to support an array of struct type in `to_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    
    val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
    df.select(to_json($"a").as("json")).show()
    ```
    
    ```
    +----------+
    |      json|
    +----------+
    |[{"_1":1}]|
    +----------+
    ```
    
    Currently, it throws an exception as below (a newline manually inserted for readability):
    
    ```
    org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type 
    mismatch: structtojson requires that the expression is a struct expression.;;
    ```
    
    This allows the roundtrip with `from_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    
    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
    df.show()
    
    // Read back.
    df.select(to_json($"array").as("json")).show()
    ```
    
    ```
    +----------+
    |     array|
    +----------+
    |[[1], [2]]|
    +----------+
    
    +-----------------+
    |             json|
    +-----------------+
    |[{"a":1},{"a":2}]|
    +-----------------+
    ```
    
    
    Also, this PR proposes to rename from `StructToJson` to `StructOrArrayToJson ` and `JsonToStruct` to `JsonToStructOrArray`.
    
    ## How was this patch tested?
    
    Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-19849

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17192
    
----
commit 92922650ca6bf44ef4f4daf02653f66125e881d2
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-03-07T14:43:37Z

    Support ArrayType in to_json to produce JSON array

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74324/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105838431
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    +    Throws an exception, in the case of an unsupported type.
    --- End diff --
    
    that's ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106566917
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -589,52 +591,69 @@ case class StructToJson(
       def this(child: Expression) = this(Map.empty, child, None)
       def this(child: Expression, options: Expression) =
         this(
    -      options = StructToJson.convertToMapData(options),
    +      options = StructOrArrayToJson.convertToMapData(options),
           child = child,
           timeZoneId = None)
     
       @transient
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts rows to the JSON output according to the given schema.
    +  @transient
    +  lazy val converter: Any => String = {
    +    def getAndReset(): String = {
    +      gen.flush()
    +      val json = writer.toString
    +      writer.reset()
    +      json
    --- End diff --
    
    nit: why don't you perform `UTF8String.fromString(` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    @brkyvz and @marmbrus, I think it is okay for SQL test and R change and it is ready for a look. Could you take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    (I just rebased to resolve the conflicts)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105566634
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1809,8 +1809,13 @@ setMethod("to_date",
     #' @export
     #' @examples
     #' \dontrun{
    -#' to_json(df$t, dateFormat = 'dd/MM/yyyy')
    -#' select(df, to_json(df$t))
    +#' # Converts a struct into a JSON object
    +#' df <- sql("SELECT named_struct('date', cast('2000-01-01' as date)) as d")
    +#' select(df, to_json(df$d, dateFormat = 'dd/MM/yyyy'))
    +#'
    +#' # Converts an array of structs into a JSON array
    +#' df <- sql("SELECT array(named_struct('date', cast('2000-01-01' as date))) as d")
    --- End diff --
    
    nit: maybe add multiple struct in the array instead of just one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74439/testReport)** for PR 17192 at commit [`e8cd20a`](https://github.com/apache/spark/commit/e8cd20a5f81e8465d324985d498fb742751805a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105055922
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
    @@ -220,4 +242,5 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {
         assert(errMsg2.getMessage.startsWith(
           "A type of keys and values in map() must be string, but got"))
       }
    +
    --- End diff --
    
    nit: delete the blank?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74718/testReport)** for PR 17192 at commit [`8c97406`](https://github.com/apache/spark/commit/8c97406b984ab68b74df2116547c1dbedb675785).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class JsonToStructs(`
      * `case class StructsToJson(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74598/testReport)** for PR 17192 at commit [`2362f81`](https://github.com/apache/spark/commit/2362f81ad36c384207ec658d01fc254b8af8b558).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105056816
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -422,7 +422,7 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    +    expression[StructOrArrayToJson]("to_json"),
    --- End diff --
    
    It seems `StructOrArrayToJson` is a little ambiguous cuz this `Array` means an array of structs, right? `StructsToJson` is better? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74720/testReport)** for PR 17192 at commit [`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106793959
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -624,41 +627,58 @@ case class StructToJson(
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    --- End diff --
    
    could we end up with `ArrayType(StructType, something_not_struct)` in this match?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106793811
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -425,8 +425,8 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    -    expression[JsonToStruct]("from_json"),
    +    expression[StructsToJson]("to_json"),
    +    expression[JsonToStructs]("from_json"),
    --- End diff --
    
    + @maropu @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74723/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105059270
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -422,7 +422,7 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    +    expression[StructOrArrayToJson]("to_json"),
    --- End diff --
    
    I think that one is not particually better. `StructsToJson` then does not refer it can be an array or a single struct if the reason is only ambiguosity. Either way is fine to me. If any committer picks up one, let me follow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106798713
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -2438,7 +2438,8 @@ setMethod("date_format", signature(y = "Column", x = "character"),
     #' from_json
     #'
     #' Parses a column containing a JSON string into a Column of \code{structType} with the specified
    -#' \code{schema}. If the string is unparseable, the Column will contains the value NA.
    +#' \code{schema} or array of \code{structType} if \code{asJsonArray} is enabled. If the string
    --- End diff --
    
    for clarity, I'd suggest saying
    `if \code{asJsonArray} is set to \code{TRUE}` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106797619
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -624,41 +627,58 @@ case class StructToJson(
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    --- End diff --
    
    Ah, it should be fine. This will be caught in https://github.com/apache/spark/pull/17192/files/185ea6003d60feed20c56de61c17bc304663d99a#diff-6626026091295ad8c0dfb66ecbcd04b1R663 ahead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74202/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74732/testReport)** for PR 17192 at commit [`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74763/testReport)** for PR 17192 at commit [`185ea60`](https://github.com/apache/spark/commit/185ea6003d60feed20c56de61c17bc304663d99a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class JsonToStructs(`
      * `case class StructsToJson(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106797920
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -425,8 +425,8 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    -    expression[JsonToStruct]("from_json"),
    +    expression[StructsToJson]("to_json"),
    +    expression[JsonToStructs]("from_json"),
    --- End diff --
    
    (It was @maropu 's initial suggestion and @brkyvz who could decide what to add agreed on this. It should be fine.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105059294
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out ---
    @@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json string with a given struct valu
     -- !query 2
     select to_json(named_struct('a', 1, 'b', 2))
     -- !query 2 schema
    -struct<structtojson(named_struct(a, 1, b, 2)):string>
    +struct<structorarraytojson(named_struct(a, 1, b, 2)):string>
     -- !query 2 output
     {"a":1,"b":2}
     
     
     -- !query 3
     select to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'))
     -- !query 3 schema
    -struct<structtojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
    +struct<structorarraytojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
     -- !query 3 output
     {"time":"26/08/2015"}
     
     
     -- !query 4
    -select to_json(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE'))
    +select to_json(array(named_struct('a', 1, 'b', 2)))
    --- End diff --
    
    Thank you for your confirmation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106798694
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -624,41 +627,58 @@ case class StructToJson(
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts rows to the JSON output according to the given schema.
    +  @transient
    +  lazy val converter: Any => UTF8String = {
    +    def getAndReset(): UTF8String = {
    +      gen.flush()
    +      val json = writer.toString
    +      writer.reset()
    +      UTF8String.fromString(json)
    +    }
    +
    +    child.dataType match {
    +      case _: StructType =>
    +        (row: Any) =>
    +          gen.write(row.asInstanceOf[InternalRow])
    +          getAndReset()
    +      case ArrayType(_: StructType, _) =>
    +        (arr: Any) =>
    +          gen.write(arr.asInstanceOf[ArrayData])
    +          getAndReset()
    +    }
    +  }
     
       override def dataType: DataType = StringType
     
    -  override def checkInputDataTypes(): TypeCheckResult = {
    -    if (StructType.acceptsType(child.dataType)) {
    +  override def checkInputDataTypes(): TypeCheckResult = child.dataType match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    --- End diff --
    
    right, thanks for testing this out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Thank you @felixcheung, @brkyvz and @maropu. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106567845
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala ---
    @@ -37,6 +37,11 @@ private[sql] class JacksonGenerator(
     
       // `ValueWriter`s for all fields of the schema
       private val rootFieldWriters: Array[ValueWriter] = schema.map(_.dataType).map(makeWriter).toArray
    +  // `ValueWriter` for array data storing rows of the schema.
    +  private val arrElementWriter: ValueWriter = {
    +    (arr: SpecializedGetters, i: Int) =>
    --- End diff --
    
    ultra nit: I would move this above like:
    
    ```scala
    private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: Int) => {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74112/testReport)** for PR 17192 at commit [`9292265`](https://github.com/apache/spark/commit/92922650ca6bf44ef4f4daf02653f66125e881d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105566685
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    +    Throws an exception, in the case of an unsupported type.
    --- End diff --
    
    nit: for python and scala here the doc wording can be confusing - it doesn't actually throw an exception right there, by itself with `to_json()`, but it might when it's trying to resolve the column, like with `select(to_json...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74268/testReport)** for PR 17192 at commit [`ed0bbae`](https://github.com/apache/spark/commit/ed0bbaeff6f842e694ac3fbfc1f6cc9895c4de2f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105054224
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
    @@ -220,4 +242,5 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {
         assert(errMsg2.getMessage.startsWith(
           "A type of keys and values in map() must be string, but got"))
       }
    +
    --- End diff --
    
    I found a newline at the end of tests prevent a conflict in some cases (_if I am not mistaken_). I am happy to revert this change back if anyone is sure that it is uesless or feel like it is an unrelated change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105336776
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    possibly if it does


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74112/testReport)** for PR 17192 at commit [`9292265`](https://github.com/apache/spark/commit/92922650ca6bf44ef4f4daf02653f66125e881d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class JsonToStructOrArray(`
      * `case class StructOrArrayToJson(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74268/testReport)** for PR 17192 at commit [`ed0bbae`](https://github.com/apache/spark/commit/ed0bbaeff6f842e694ac3fbfc1f6cc9895c4de2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Thank you @felixcheung for your review and proceeding this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    cc @brkyvz and @marmbrus, could you please take a look and see if it makes sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105059417
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -422,7 +422,7 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    +    expression[StructOrArrayToJson]("to_json"),
    --- End diff --
    
    okay, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106567643
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    +    Throws an exception, in the case of an unsupported type.
     
    -    :param col: name of column containing the struct
    +    :param col: name of column containing the struct or an array of the structs
    --- End diff --
    
    `array of structs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105605336
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    +    Throws an exception, in the case of an unsupported type.
    --- End diff --
    
    If you don't strongly feel about this, let me leave as is (and sweep it in another PR). In most cases, I believe users will use directly with `select`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106797634
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1774,10 +1774,11 @@ def json_tuple(col, *fields):
     def from_json(col, schema, options={}):
         """
         Parses a column containing a JSON string into a [[StructType]] or [[ArrayType]]
    -    with the specified schema. Returns `null`, in the case of an unparseable string.
    +    of [[StructType]]s with the specified schema. Returns `null`, in the case of an unparseable
    +    string.
    --- End diff --
    
    Sure, let me try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106567525
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    --- End diff --
    
    We should mention that the `ArrayType` should be of `StructType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74718/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74806/testReport)** for PR 17192 at commit [`5d390e7`](https://github.com/apache/spark/commit/5d390e7ed34b5de2e264c5f116867a77de39f2ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105052901
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out ---
    @@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json string with a given struct valu
     -- !query 2
     select to_json(named_struct('a', 1, 'b', 2))
     -- !query 2 schema
    -struct<structtojson(named_struct(a, 1, b, 2)):string>
    +struct<structorarraytojson(named_struct(a, 1, b, 2)):string>
     -- !query 2 output
     {"a":1,"b":2}
     
     
     -- !query 3
     select to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'))
     -- !query 3 schema
    -struct<structtojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
    +struct<structorarraytojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
     -- !query 3 output
     {"time":"26/08/2015"}
     
     
     -- !query 4
    -select to_json(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE'))
    +select to_json(array(named_struct('a', 1, 'b', 2)))
    --- End diff --
    
    Cc @maropu, thia adds an test in the file you latelly wrote. Could you check if this follows your indention?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74267/testReport)** for PR 17192 at commit [`708e155`](https://github.com/apache/spark/commit/708e155784cd4da55935a3b3bc9eb19cee609851).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    I will do another pass today and merge if others do not have more concerns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74763/testReport)** for PR 17192 at commit [`185ea60`](https://github.com/apache/spark/commit/185ea6003d60feed20c56de61c17bc304663d99a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106797757
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -624,41 +627,58 @@ case class StructToJson(
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts rows to the JSON output according to the given schema.
    +  @transient
    +  lazy val converter: Any => UTF8String = {
    +    def getAndReset(): UTF8String = {
    +      gen.flush()
    +      val json = writer.toString
    +      writer.reset()
    +      UTF8String.fromString(json)
    +    }
    +
    +    child.dataType match {
    +      case _: StructType =>
    +        (row: Any) =>
    +          gen.write(row.asInstanceOf[InternalRow])
    +          getAndReset()
    +      case ArrayType(_: StructType, _) =>
    +        (arr: Any) =>
    +          gen.write(arr.asInstanceOf[ArrayData])
    +          getAndReset()
    +    }
    +  }
     
       override def dataType: DataType = StringType
     
    -  override def checkInputDataTypes(): TypeCheckResult = {
    -    if (StructType.acceptsType(child.dataType)) {
    +  override def checkInputDataTypes(): TypeCheckResult = child.dataType match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    --- End diff --
    
    It seems `StructType.acceptsType` and `ArrayType.acceptsType` call `isInstanceOf[StructType]` and `isInstanceOf[ArrayType]`. `isInstanceOf` and pattern matching are interchangeable in most cases up to my knowledge. 
    
    (I just found a reference https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s15.html)
    
    Namely, this case below should be fine. (Up to my knowledge, Scala forbids case-to-case inheritance BTW)
    
    ```
    scala> case class A()
    defined class A
    
    scala> class B extends A
    defined class B
    
    scala> new B() match {case _: A => println(1)}
    1
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105097834
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    this might be a bit hard for regular users to create the DataFrame this way - is there a different way that still produces column that works with `to_json`?
    
    that should probably be in the example in functions.R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74806/testReport)** for PR 17192 at commit [`5d390e7`](https://github.com/apache/spark/commit/5d390e7ed34b5de2e264c5f116867a77de39f2ec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106567583
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1795,10 +1795,10 @@ setMethod("to_date",
     
     #' to_json
     #'
    -#' Converts a column containing a \code{structType} into a Column of JSON string.
    -#' Resolving the Column can fail if an unsupported type is encountered.
    +#' Converts a column containing a \code{structType} or an array of \code{structType} into a Column
    +#' of JSON string. Resolving the Column can fail if an unsupported type is encountered.
     #'
    -#' @param x Column containing the struct
    +#' @param x Column containing the struct or an array of the structs
    --- End diff --
    
    nit: `array of structs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74720/testReport)** for PR 17192 at commit [`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74324/testReport)** for PR 17192 at commit [`3a992f3`](https://github.com/apache/spark/commit/3a992f3507492992500329b09ef94e14f4434e59).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74268/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74814/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74267/testReport)** for PR 17192 at commit [`708e155`](https://github.com/apache/spark/commit/708e155784cd4da55935a3b3bc9eb19cee609851).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105566705
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1814,6 +1814,10 @@ def to_json(col, options={}):
         >>> df = spark.createDataFrame(data, ("key", "value"))
         >>> df.select(to_json(df.value).alias("json")).collect()
         [Row(json=u'{"age":2,"name":"Alice"}')]
    +    >>> data = [(1, [Row(name='Alice', age=2)])]
    --- End diff --
    
    ditto - I'd suggest adding multiple struct in the array


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74811/testReport)** for PR 17192 at commit [`6ac57d3`](https://github.com/apache/spark/commit/6ac57d3a2b8d5df9851aa06e8e2fb6fafb76005d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105055860
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out ---
    @@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json string with a given struct valu
     -- !query 2
     select to_json(named_struct('a', 1, 'b', 2))
     -- !query 2 schema
    -struct<structtojson(named_struct(a, 1, b, 2)):string>
    +struct<structorarraytojson(named_struct(a, 1, b, 2)):string>
     -- !query 2 output
     {"a":1,"b":2}
     
     
     -- !query 3
     select to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'))
     -- !query 3 schema
    -struct<structtojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
    +struct<structorarraytojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
     -- !query 3 output
     {"time":"26/08/2015"}
     
     
     -- !query 4
    -select to_json(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE'))
    +select to_json(array(named_struct('a', 1, 'b', 2)))
    --- End diff --
    
    yea, it looks okay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105602228
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1802,10 +1802,10 @@ def from_json(col, schema, options={}):
     @since(2.1)
     def to_json(col, options={}):
         """
    -    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
    -    in the case of an unsupported type.
    +    Converts a column containing a [[StructType]] or [[ArrayType]] into a JSON string.
    +    Throws an exception, in the case of an unsupported type.
    --- End diff --
    
    Hm, in that sense, there are many instances, for example, `Returns the double value`, `Returns the positive value` or ``Returns `null` ...``. They return a column. Another example is, `Extracts json object ...`. It does not extract it right there but make a column.
    
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74324/testReport)** for PR 17192 at commit [`3a992f3`](https://github.com/apache/spark/commit/3a992f3507492992500329b09ef94e14f4434e59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105052696
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    @felixcheung, it is a little bit of R codes here but could you check the test and documentation? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105336865
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    yes `listToStruct` is internal and it's mucking with types (though it's legitimate to do in R)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74806/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74720/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74723/testReport)** for PR 17192 at commit [`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106567410
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -422,7 +422,7 @@ object FunctionRegistry {
         expression[BitwiseXor]("^"),
     
         // json
    -    expression[StructToJson]("to_json"),
    +    expression[StructOrArrayToJson]("to_json"),
    --- End diff --
    
    since this is an internal name, I'm fine either way, but I prefer `StructsToJson`, because we can't use this method on an array of integers. I feel `StructOrArrayToJson` is a bit more ambiguous


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74598/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105337154
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    Let me propose this with it for now and try to find another way until it is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r105101799
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1339,6 +1339,11 @@ test_that("column functions", {
       expect_equal(collect(select(df, bround(df$x, 0)))[[1]][2], 4)
     
       # Test to_json(), from_json()
    +  arr <- list(listToStruct(list("name" = "bob")))
    +  df <- as.DataFrame(list(listToStruct(list("people" = arr))))
    +  j <- collect(select(df, alias(to_json(df$people), "json")))
    +  expect_equal(j[order(j$json), ][1], "[{\"name\":\"bob\"}]")
    +
    --- End diff --
    
    Sure, let me add it if I find a better way (it seems `listToStruct` is even an interval API.. ). Is it okay to use `sql("SELECT array(named_struct('name','bob')) as people")` instead if it works fine?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74598/testReport)** for PR 17192 at commit [`2362f81`](https://github.com/apache/spark/commit/2362f81ad36c384207ec658d01fc254b8af8b558).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74202/testReport)** for PR 17192 at commit [`ed0bbae`](https://github.com/apache/spark/commit/ed0bbaeff6f842e694ac3fbfc1f6cc9895c4de2f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74718/testReport)** for PR 17192 at commit [`8c97406`](https://github.com/apache/spark/commit/8c97406b984ab68b74df2116547c1dbedb675785).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74732/testReport)** for PR 17192 at commit [`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74814/testReport)** for PR 17192 at commit [`6ac57d3`](https://github.com/apache/spark/commit/6ac57d3a2b8d5df9851aa06e8e2fb6fafb76005d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106793800
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1774,10 +1774,11 @@ def json_tuple(col, *fields):
     def from_json(col, schema, options={}):
         """
         Parses a column containing a JSON string into a [[StructType]] or [[ArrayType]]
    -    with the specified schema. Returns `null`, in the case of an unparseable string.
    +    of [[StructType]]s with the specified schema. Returns `null`, in the case of an unparseable
    +    string.
    --- End diff --
    
    if we are updating this in python, we should update R too?
    https://github.com/HyukjinKwon/spark/blob/185ea6003d60feed20c56de61c17bc304663d99a/R/pkg/R/functions.R#L2440



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74763/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74267/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74814/testReport)** for PR 17192 at commit [`6ac57d3`](https://github.com/apache/spark/commit/6ac57d3a2b8d5df9851aa06e8e2fb6fafb76005d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r104886708
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out ---
    @@ -32,32 +34,40 @@ Usage: to_json(expr[, options]) - Returns a json string with a given struct valu
     -- !query 2
     select to_json(named_struct('a', 1, 'b', 2))
     -- !query 2 schema
    -struct<structtojson(named_struct(a, 1, b, 2)):string>
    +struct<structorarraytojson(named_struct(a, 1, b, 2)):string>
     -- !query 2 output
     {"a":1,"b":2}
     
     
     -- !query 3
     select to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'))
     -- !query 3 schema
    -struct<structtojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
    +struct<structorarraytojson(named_struct(time, to_timestamp('2015-08-26', 'yyyy-MM-dd'))):string>
     -- !query 3 output
     {"time":"26/08/2015"}
     
     
     -- !query 4
    -select to_json(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE'))
    +select to_json(array(named_struct('a', 1, 'b', 2)))
     -- !query 4 schema
    -struct<>
    +struct<structorarraytojson(array(named_struct(a, 1, b, 2))):string>
     -- !query 4 output
    -org.apache.spark.sql.AnalysisException
    -Must use a map() function for options;; line 1 pos 7
    +[{"a":1,"b":2}]
     
     
     -- !query 5
    -select to_json()
    +select to_json(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE'))
    --- End diff --
    
    This happened to look a bit weird but I had to add this in the middle of the sql file - https://github.com/apache/spark/pull/17192/files#diff-3b8a538abd658a260aa32c4aa593bed7R6 to represent this is not the case of the error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    **[Test build #74439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74439/testReport)** for PR 17192 at commit [`e8cd20a`](https://github.com/apache/spark/commit/e8cd20a5f81e8465d324985d498fb742751805a9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17192#discussion_r106794013
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -624,41 +627,58 @@ case class StructToJson(
       lazy val writer = new CharArrayWriter()
     
       @transient
    -  lazy val gen =
    -    new JacksonGenerator(
    -      child.dataType.asInstanceOf[StructType],
    -      writer,
    -      new JSONOptions(options, timeZoneId.get))
    +  lazy val gen = new JacksonGenerator(
    +    rowSchema, writer, new JSONOptions(options, timeZoneId.get))
    +
    +  @transient
    +  lazy val rowSchema = child.dataType match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts rows to the JSON output according to the given schema.
    +  @transient
    +  lazy val converter: Any => UTF8String = {
    +    def getAndReset(): UTF8String = {
    +      gen.flush()
    +      val json = writer.toString
    +      writer.reset()
    +      UTF8String.fromString(json)
    +    }
    +
    +    child.dataType match {
    +      case _: StructType =>
    +        (row: Any) =>
    +          gen.write(row.asInstanceOf[InternalRow])
    +          getAndReset()
    +      case ArrayType(_: StructType, _) =>
    +        (arr: Any) =>
    +          gen.write(arr.asInstanceOf[ArrayData])
    +          getAndReset()
    +    }
    +  }
     
       override def dataType: DataType = StringType
     
    -  override def checkInputDataTypes(): TypeCheckResult = {
    -    if (StructType.acceptsType(child.dataType)) {
    +  override def checkInputDataTypes(): TypeCheckResult = child.dataType match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    --- End diff --
    
    this seems to become a bit more strict from before?
    was: `StructType.acceptsType` - anything can be accepted as struct
    now: `match { case _: StructType` - must be struct
    
    am I understand this correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74732/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74112/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74811/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74439/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org