You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/02/14 16:41:53 UTC

[GitHub] spark pull request #16929: [SPARK-19595][SQL] Do not allow json array in fro...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/16929

    [SPARK-19595][SQL] Do not allow json array in from_json

    ## What changes were proposed in this pull request?
    
    Currently, it only reads the single row when the input is a json array. So, the codes below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    val schema = StructType(StructField("a", IntegerType) :: Nil)
    Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show()
    ```
    prints 
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                 [1]|
    +--------------------+
    ```
    We may consider supporting this as a generator expression but I guess it'd be arguable. So, this PR simply suggests to disallow json array in `from_json` for now.
    
    **After**
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                null|
    +--------------------+
    ``` 
    
    ## How was this patch tested?
    
    Unit test in `JsonExpressionsSuite` and manual test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark disallow-array

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16929
    
----
commit acbce26cd983c4e3510a8db707196e3cd848aba2
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-02-14T15:37:00Z

    Do not allow json array in from_json

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73474/testReport)** for PR 16929 at commit [`25086ed`](https://github.com/apache/spark/commit/25086edc90b2bad7e27e8897108426fbd29dc00f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73475/testReport)** for PR 16929 at commit [`2aaf609`](https://github.com/apache/spark/commit/2aaf609c70877f40c7f5d179ca42463abc666bac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Thanks @HyukjinKwon took a pass. My comments are mainly:
    
     1. We don't need to support APIs for both `StructType` and `ArrayType`. I would rather just add an API for `DataType` and `require` that the `DataType` is either `StructType` or `ArrayType`. 
    
    2. If a user specifies the schema as an `Array` but one of the rows has a JSON object, we should still consider it an Array of records. No need to separate `Array support` and `Object support`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    should the title of this PR be updated to reflect that we *should* support array of json?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73474/testReport)** for PR 16929 at commit [`25086ed`](https://github.com/apache/spark/commit/25086edc90b2bad7e27e8897108426fbd29dc00f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Then, let me fix this as below:
    
    - `from_json` with `StructType`
      - JSON array -> `null` 
      - JSON object -> `Row(...)` 
    
    - `from_json` with `ArrayType`
      - JSON array -> `Array(Row(...), ...)`
      - JSON object -> `Array(Row(...), ...)`
    
    - exposed API 
      - `from_json(..., schema: StructType)`
      - `from_json(..., schema: DataType)` 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73475/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73467/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73491/testReport)** for PR 16929 at commit [`470d879`](https://github.com/apache/spark/commit/470d87969d8fa2de6adfd3765086e03ec8f12252).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104253528
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,23 +480,45 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      (rows: Seq[InternalRow]) => if (rows.length == 1) rows.head else null
    --- End diff --
    
    We should list this in the release notes though (i.e. go tag the JIRA).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    also cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73111/testReport)** for PR 16929 at commit [`8122314`](https://github.com/apache/spark/commit/812231499518fa5ce5385f4bb28b4380fe9c7262).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73477/testReport)** for PR 16929 at commit [`a0a7091`](https://github.com/apache/spark/commit/a0a7091e58d84d1927b7e17511414bf952c73cf5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Sure, let me turn it to suuport. I thought disallowing was kind of a safe choice to me :).. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104326605
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala ---
    @@ -372,6 +372,62 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         )
       }
     
    +  test("from_json - input=array, schema=array, output=array") {
    --- End diff --
    
    these are great! thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Hi @marmbrus, does this sounds good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103268734
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -39,7 +39,12 @@ private[sql] class SparkSQLJsonProcessingException(msg: String) extends RuntimeE
      */
     class JacksonParser(
         schema: StructType,
    -    options: JSONOptions) extends Logging {
    +    options: JSONOptions,
    +    arraySupport: Boolean = true,
    --- End diff --
    
    as I commented above, I don't think we need this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    @HyukjinKwon Implementation seems fine. Just left a cosmetic comment on your unit tests. Otherwise LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73893/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73621/testReport)** for PR 16929 at commit [`0c088bf`](https://github.com/apache/spark/commit/0c088bfc9469f5dc546f4d153ada609ad3b0b6ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    I just updated the PR description to prevent confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103268655
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      // These are always produced from json objects by `objectSupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => rows.head
    +
    +    case ArrayType(_: StructType, _) =>
    +      // These are always produced from json arrays by `arraySupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => new GenericArrayData(rows)
    +  }
    +
       @transient
       lazy val parser =
         new JacksonParser(
    -      schema,
    -      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get))
    +      rowSchema,
    +      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get),
    +      objectSupport = schema.isInstanceOf[StructType],
    --- End diff --
    
    Do you think we need the `objectSupport` and `arraySupport`?
    I would rather not add it. If someone specifies an `ArrayType` but the row contains just an object, let's still just return it as an `ArrayType`. I think users would appreciate this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #72878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72878/testReport)** for PR 16929 at commit [`acbce26`](https://github.com/apache/spark/commit/acbce26cd983c4e3510a8db707196e3cd848aba2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73491/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73477/testReport)** for PR 16929 at commit [`a0a7091`](https://github.com/apache/spark/commit/a0a7091e58d84d1927b7e17511414bf952c73cf5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73475/testReport)** for PR 16929 at commit [`2aaf609`](https://github.com/apache/spark/commit/2aaf609c70877f40c7f5d179ca42463abc666bac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103386481
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    --- End diff --
    
    I tried several combinations with `TypeCollection` but it seems not working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73491/testReport)** for PR 16929 at commit [`470d879`](https://github.com/apache/spark/commit/470d87969d8fa2de6adfd3765086e03ec8f12252).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73110/testReport)** for PR 16929 at commit [`25cdd7d`](https://github.com/apache/spark/commit/25cdd7d002629db9be6c8a5460a6044f499a6e6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    @hvanhovell, @zsxwing  and @marmbrus, I just updated and rebased. Could you take another look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73893/testReport)** for PR 16929 at commit [`3d490e3`](https://github.com/apache/spark/commit/3d490e34136ec76deaed15ebe8d6e7e8aac96776).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73621/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #72878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72878/testReport)** for PR 16929 at commit [`acbce26`](https://github.com/apache/spark/commit/acbce26cd983c4e3510a8db707196e3cd848aba2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103337914
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      // These are always produced from json objects by `objectSupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => rows.head
    +
    +    case ArrayType(_: StructType, _) =>
    +      // These are always produced from json arrays by `arraySupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => new GenericArrayData(rows)
    +  }
    +
       @transient
       lazy val parser =
         new JacksonParser(
    -      schema,
    -      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get))
    +      rowSchema,
    +      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get),
    +      objectSupport = schema.isInstanceOf[StructType],
    --- End diff --
    
    > What does the input look like, and what are they specifying?
    
    `JscksonParser.parse` produces `Seq[InternalRow]` and it takes `StructType`. What I meant by both `objectSupport` and `arraySupport` is, JSON object and JSON array because we support both as a root JSON object and intended to produce `null` if one of them is disabled. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73111/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73572/testReport)** for PR 16929 at commit [`54e60bb`](https://github.com/apache/spark/commit/54e60bb149cd882c21856f19df0cf375c3ca3b20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Hi @marmbrus, @brkyvz and @hvanhovell, I think it is ready for a review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103337028
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2969,11 +2969,27 @@ object functions {
       }
     
       /**
    -   * (Java-specific) Parses a column containing a JSON string into a `StructType` with the
    +   * (Scala-specific) Parses a column containing a JSON array string into a `ArrayType` with the
        * specified schema. Returns `null`, in the case of an unparseable string.
        *
    -   * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string
    +   * @param e a string column containing JSON array data.
    +   * @param schema the schema to use when parsing the json array string
    +   * @param options options to control how the json is parsed. accepts the same options and the
    +   *                json data source.
    +   *
    +   * @group collection_funcs
    +   * @since 2.2.0
    +   */
    +  def from_json(e: Column, schema: ArrayType, options: Map[String, String]): Column = withExpr {
    --- End diff --
    
    I think you can leave a bridge method in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73114/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73114/testReport)** for PR 16929 at commit [`d6fd39b`](https://github.com/apache/spark/commit/d6fd39b97aea3deaec53cfeaf522467c831c4ec4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    I agree that its wrong to truncate, but why not just fix handling of arrays rather than disallow it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73113/testReport)** for PR 16929 at commit [`c7b5f2e`](https://github.com/apache/spark/commit/c7b5f2e3539da3db24953049f23aac0c07043fbd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104253014
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,23 +480,45 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      (rows: Seq[InternalRow]) => if (rows.length == 1) rows.head else null
    --- End diff --
    
    this breaks previous behavior. I would still return the first element. Feel free to push back. Also wonder what @marmbrus  thinks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73110/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103334238
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    --- End diff --
    
    Uh.. I though `schema` is not a child but just a parameter. Let me check!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73474/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73570/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103333990
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2969,11 +2969,27 @@ object functions {
       }
     
       /**
    -   * (Java-specific) Parses a column containing a JSON string into a `StructType` with the
    +   * (Scala-specific) Parses a column containing a JSON array string into a `ArrayType` with the
        * specified schema. Returns `null`, in the case of an unparseable string.
        *
    -   * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string
    +   * @param e a string column containing JSON array data.
    +   * @param schema the schema to use when parsing the json array string
    +   * @param options options to control how the json is parsed. accepts the same options and the
    +   *                json data source.
    +   *
    +   * @group collection_funcs
    +   * @since 2.2.0
    +   */
    +  def from_json(e: Column, schema: ArrayType, options: Map[String, String]): Column = withExpr {
    --- End diff --
    
    I thought changing `StructType` to `DataType` breaks binary compatibility as method signature is changed and the app complied in 2.1.0 does not run in 2.2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104278223
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala ---
    @@ -372,6 +372,58 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         )
       }
     
    +  test("from_json - array") {
    +    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    +
    +    // json array: `Array(Row(...), ...)`
    +    val jsonData1 = """[{"a": 1}, {"a": 2}]"""
    +    val expected =
    +      InternalRow.fromSeq(1 :: Nil) ::
    +      InternalRow.fromSeq(2 :: Nil) :: Nil
    +    checkEvaluation(JsonToStruct(
    +      schema, Map.empty, Literal(jsonData1), gmtId), expected)
    +
    +    // json object: `Array(Row(...))`
    +    val jsonData2 = """{"a": 1}"""
    --- End diff --
    
    I would make each example a separate test. This way it's easier to figure out what breaks later.
    
    e.g.
      1. `from_json - input=array, schema=array, output=array`
      2. `from_json - input=object, schema=array, output=array of single object`
      3. `from_json - input=empty json array, schema=array, output=empty array`
    ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73114/testReport)** for PR 16929 at commit [`d6fd39b`](https://github.com/apache/spark/commit/d6fd39b97aea3deaec53cfeaf522467c831c4ec4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73113/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Hmm, I'm not sure we want to change this to a generator.  I think that has performance consequences as well as possibly being surprising.  I would probably make it possible to handle arrays (when the correct schema is given).  If they want to explode they can run `explode(from_json(...))`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Sure, let me take a look and try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73110/testReport)** for PR 16929 at commit [`25cdd7d`](https://github.com/apache/spark/commit/25cdd7d002629db9be6c8a5460a6044f499a6e6d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73111/testReport)** for PR 16929 at commit [`8122314`](https://github.com/apache/spark/commit/812231499518fa5ce5385f4bb28b4380fe9c7262).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    /cc @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73621/testReport)** for PR 16929 at commit [`0c088bf`](https://github.com/apache/spark/commit/0c088bfc9469f5dc546f4d153ada609ad3b0b6ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73113/testReport)** for PR 16929 at commit [`c7b5f2e`](https://github.com/apache/spark/commit/c7b5f2e3539da3db24953049f23aac0c07043fbd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103262990
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2969,11 +2969,27 @@ object functions {
       }
     
       /**
    -   * (Java-specific) Parses a column containing a JSON string into a `StructType` with the
    +   * (Scala-specific) Parses a column containing a JSON array string into a `ArrayType` with the
        * specified schema. Returns `null`, in the case of an unparseable string.
        *
    -   * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string
    +   * @param e a string column containing JSON array data.
    +   * @param schema the schema to use when parsing the json array string
    +   * @param options options to control how the json is parsed. accepts the same options and the
    +   *                json data source.
    +   *
    +   * @group collection_funcs
    +   * @since 2.2.0
    +   */
    +  def from_json(e: Column, schema: ArrayType, options: Map[String, String]): Column = withExpr {
    --- End diff --
    
    why do we need the `ArrayType` specific methods? Can't we just change the `StructType` -> `DataType` and do a `require` check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Thanks for your detailed look. Let me check again and address the comments!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73572/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Thank you @brkyvz.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16929


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73574/testReport)** for PR 16929 at commit [`0c088bf`](https://github.com/apache/spark/commit/0c088bfc9469f5dc546f4d153ada609ad3b0b6ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73573/testReport)** for PR 16929 at commit [`9f1e966`](https://github.com/apache/spark/commit/9f1e96637cd6d67db0b5811daf2b33a9f49980a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73574/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103302035
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      // These are always produced from json objects by `objectSupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => rows.head
    +
    +    case ArrayType(_: StructType, _) =>
    +      // These are always produced from json arrays by `arraySupport` in `JacksonParser`.
    +      (rows: Seq[InternalRow]) => new GenericArrayData(rows)
    +  }
    +
       @transient
       lazy val parser =
         new JacksonParser(
    -      schema,
    -      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get))
    +      rowSchema,
    +      new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get),
    +      objectSupport = schema.isInstanceOf[StructType],
    --- End diff --
    
    I'm not sure I follow.  What does the input look like, and what are they specifying?  I would avoid magic unless we really think users need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    @mambrus and @brkyvz, would there be other things I should double check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73467/testReport)** for PR 16929 at commit [`8c48436`](https://github.com/apache/spark/commit/8c48436a7a531db39e951ed1f43fb9aa21008e0b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103300622
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2969,11 +2969,27 @@ object functions {
       }
     
       /**
    -   * (Java-specific) Parses a column containing a JSON string into a `StructType` with the
    +   * (Scala-specific) Parses a column containing a JSON array string into a `ArrayType` with the
        * specified schema. Returns `null`, in the case of an unparseable string.
        *
    -   * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string
    +   * @param e a string column containing JSON array data.
    +   * @param schema the schema to use when parsing the json array string
    +   * @param options options to control how the json is parsed. accepts the same options and the
    +   *                json data source.
    +   *
    +   * @group collection_funcs
    +   * @since 2.2.0
    +   */
    +  def from_json(e: Column, schema: ArrayType, options: Map[String, String]): Column = withExpr {
    --- End diff --
    
    Really, why not just support any `DataType` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73893/testReport)** for PR 16929 at commit [`3d490e3`](https://github.com/apache/spark/commit/3d490e34136ec76deaed15ebe8d6e7e8aac96776).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Cc @brkyvz and @marmbrus could this be merged by any chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73570/testReport)** for PR 16929 at commit [`72d6410`](https://github.com/apache/spark/commit/72d641018635aae94cc89e216e30540233d461f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72878/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103268156
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    --- End diff --
    
    why not just override:
    
    `override def inputTypes = new TypeCollection(ArrayType, StructType) :: Nil`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r103338371
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    --- End diff --
    
    Uh.. I thought `schema` is not the child of the expression. Let me check again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73477/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Do not allow json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Cc @hvanhovell, could you please take a look and see if this makes sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Let me clean up and fix the tests if failed with an updated PR description soon. It is still a wip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    @brkyvz, @marmbrus - I think it is ready for another look. Could you see if I understood your comments correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73573/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104253484
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala ---
    @@ -480,23 +480,45 @@ case class JsonTuple(children: Seq[Expression])
     }
     
     /**
    - * Converts an json input string to a [[StructType]] with the specified schema.
    + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema.
      */
     case class JsonToStruct(
    -    schema: StructType,
    +    schema: DataType,
         options: Map[String, String],
         child: Expression,
         timeZoneId: Option[String] = None)
       extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes {
       override def nullable: Boolean = true
     
    -  def this(schema: StructType, options: Map[String, String], child: Expression) =
    +  def this(schema: DataType, options: Map[String, String], child: Expression) =
         this(schema, options, child, None)
     
    +  override def checkInputDataTypes(): TypeCheckResult = schema match {
    +    case _: StructType | ArrayType(_: StructType, _) =>
    +      super.checkInputDataTypes()
    +    case _ => TypeCheckResult.TypeCheckFailure(
    +      s"Input schema ${schema.simpleString} must be a struct or an array of structs.")
    +  }
    +
    +  @transient
    +  lazy val rowSchema = schema match {
    +    case st: StructType => st
    +    case ArrayType(st: StructType, _) => st
    +  }
    +
    +  // This converts parsed rows to the desired output by the given schema.
    +  @transient
    +  lazy val converter = schema match {
    +    case _: StructType =>
    +      (rows: Seq[InternalRow]) => if (rows.length == 1) rows.head else null
    --- End diff --
    
    I'm okay breaking previous behavior because I'd call truncating an array a bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16929#discussion_r104278176
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala ---
    @@ -372,6 +372,58 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         )
       }
     
    +  test("from_json - array") {
    +    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    +
    +    // json array: `Array(Row(...), ...)`
    +    val jsonData1 = """[{"a": 1}, {"a": 2}]"""
    +    val expected =
    +      InternalRow.fromSeq(1 :: Nil) ::
    +      InternalRow.fromSeq(2 :: Nil) :: Nil
    +    checkEvaluation(JsonToStruct(
    +      schema, Map.empty, Literal(jsonData1), gmtId), expected)
    --- End diff --
    
    could you put input and expected output in different rows for readability please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [WIP][SPARK-19595][SQL] Support json array in from_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    **[Test build #73467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73467/testReport)** for PR 16929 at commit [`8c48436`](https://github.com/apache/spark/commit/8c48436a7a531db39e951ed1f43fb9aa21008e0b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16929
  
    Thank you so much. Let me clean up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org