You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2017/03/24 03:52:12 UTC

[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/17406

    [SPARK-20009][SQL] Use DDL strings for defining schema in user-facing APIs

    ## What changes were proposed in this pull request?
    This pr added `DataType.fromString`  to support both existing json format and new DDL format for defining schemas in user-facing APIs, e.g., `functions.from_json`.
    
    ## How was this patch tested?
    Added tests in `JsonFunctionsSuite` and modified some existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-20009

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17406
    
----
commit 87abf7b23f284e3a7728efc90b2ac52a3ce9d294
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-03-18T03:56:20Z

    Use DDL strings for defining schema

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108339957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3055,13 +3056,21 @@ object functions {
        * with the specified schema. Returns `null`, in the case of an unparseable string.
        *
        * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string as a json string
    +   * @param schema the schema to use when parsing the json string as a json string. In Spark 2.1,
    +   *               the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL
    +   *               format is also supported for the schema.
        *
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    val dataType = try {
    --- End diff --
    
    That is fine, right? cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Also cc @cloud-fan @hvanhovell 
    
    If no further comment, maybe we can merge it to master tomorrow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    oh, it seems we hit weird errors...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75227/testReport)** for PR 17406 at commit [`6378e99`](https://github.com/apache/spark/commit/6378e9922cd750bda134ca0a6ce1234f615252ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75207/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75227/testReport)** for PR 17406 at commit [`6378e99`](https://github.com/apache/spark/commit/6378e9922cd750bda134ca0a6ce1234f615252ca).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108042411
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,45 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType): Unit = {
    +    test(s"from DDL - $dataType") {
    +      assert(DataType.fromDDL(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    +  }
    +
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +    checkDataTypeFromDDL(dataType)
    +  }
    +
    +  // In some types, check json formats only because the types do not support DDL formats.
    +  checkDataTypeFromJson(NullType)
    +
    +  checkDataTypeFromText(BooleanType)
    +  checkDataTypeFromText(ByteType)
    +  checkDataTypeFromText(ShortType)
    +  checkDataTypeFromText(IntegerType)
    +  checkDataTypeFromText(LongType)
    +  checkDataTypeFromText(FloatType)
    +  checkDataTypeFromText(DoubleType)
    +  checkDataTypeFromText(DecimalType(10, 5))
    +  checkDataTypeFromText(DecimalType.SYSTEM_DEFAULT)
    +  checkDataTypeFromText(DateType)
    +  checkDataTypeFromText(TimestampType)
    +  checkDataTypeFromText(StringType)
    +  checkDataTypeFromText(BinaryType)
    +
    +  checkDataTypeFromText(ArrayType(DoubleType, true))
    +  checkDataTypeFromJson(ArrayType(StringType, false))
    +
    +  checkDataTypeFromText(MapType(IntegerType, StringType, true))
    +  checkDataTypeFromJson(MapType(IntegerType, ArrayType(DoubleType), false))
    --- End diff --
    
    You can keep both, but add an extra one for `checkDataTypeFromJson ` when the nullability is `false`.
    
    ```
    checkDataTypeFromText(MapType(IntegerType, ArrayType(DoubleType), true))
    checkDataTypeFromJson(MapType(IntegerType, ArrayType(DoubleType), false))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75313/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75303/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108343902
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates StructType for a given DDL-formatted string, which is a comma separated list of field
    +   * definitions, e.g., a INT, b STRING.
    +   */
    +  def fromDDL(ddl: String): StructType = CatalystSqlParser.parseTableSchema(ddl)
    --- End diff --
    
    shall we move it to `object StructType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108344017
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3055,13 +3056,21 @@ object functions {
        * with the specified schema. Returns `null`, in the case of an unparseable string.
        *
        * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string as a json string
    +   * @param schema the schema to use when parsing the json string as a json string. In Spark 2.1,
    +   *               the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL
    +   *               format is also supported for the schema.
        *
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    val dataType = try {
    --- End diff --
    
    yea I think it's fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75196/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107853540
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    Thanks for the valuable comments!
    Yea, I basically agree with @HyukjinKwon idea. Also, `fromDdl` is better than `fromString` to me as @viirya suggested. I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75141/testReport)** for PR 17406 at commit [`87abf7b`](https://github.com/apache/spark/commit/87abf7b23f284e3a7728efc90b2ac52a3ce9d294).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Support DDL strings for defini...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17406


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75159/testReport)** for PR 17406 at commit [`fb61535`](https://github.com/apache/spark/commit/fb61535c6f3088fe093e49043308b4a33b65fd51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107847861
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    In this pr, I just propose we support new DDL formats in the existing APIs that's already used json formats for defining schemas (e.g., `functions.from_json`). So, we need to support both json formats and DDL formats there. If I misunderstood you, could you correct me? Thanks for your comment!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75236/testReport)** for PR 17406 at commit [`e2d121f`](https://github.com/apache/spark/commit/e2d121f214c515cbe043d7954c367570048944b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107969140
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,43 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  // Test json formats only for the types that DDL formats do not support
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +
    +    // Test DDL formats
    +    test(s"from ddl - $dataType") {
    +      assert(DataType.fromDdl(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    --- End diff --
    
    How about defining a function like `checkDataTypeFromJson `? It looks more consistent, if so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107967499
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3060,8 +3061,17 @@ object functions {
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    // Until Spark-2.1, we use json strings for defining schemas here. Since we add an user-friendly
    +    // API in the DDL parser, we employ DDL formats for the case. To keep back-compatibility,
    --- End diff --
    
    ` we employ DDL formats for the case` -> `DDL format is also supported here`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108050303
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,45 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType): Unit = {
    +    test(s"from DDL - $dataType") {
    +      assert(DataType.fromDDL(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    +  }
    +
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +    checkDataTypeFromDDL(dataType)
    +  }
    +
    +  // In some types, check json formats only because the types do not support DDL formats.
    --- End diff --
    
    In Spark SQL, we do not enforce nullability. It is like a hint for optimization. Not supporting such an syntax is reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Also please update the PR title and description. Thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75228/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108024156
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala ---
    @@ -36,7 +38,13 @@ class SimpleTextSource extends TextBasedFileFormat with DataSourceRegister {
           sparkSession: SparkSession,
           options: Map[String, String],
           files: Seq[FileStatus]): Option[StructType] = {
    -    Some(DataType.fromJson(options("dataSchema")).asInstanceOf[StructType])
    +    val schemaAsString = options("dataSchema")
    +    val schema = try {
    +      DataType.fromJson(schemaAsString)
    +    } catch {
    +      case NonFatal(_) => DataType.fromDdl(schemaAsString)
    +    }
    +    Some(schema.asInstanceOf[StructType])
    --- End diff --
    
    Aha, I missed the point. okay, I'll revert this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    cc @marmbrus @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75252/testReport)** for PR 17406 at commit [`842ca77`](https://github.com/apache/spark/commit/842ca772cf489ca00b8a16fd428f1e0a234faaab).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75141/testReport)** for PR 17406 at commit [`87abf7b`](https://github.com/apache/spark/commit/87abf7b23f284e3a7728efc90b2ac52a3ce9d294).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107966362
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala ---
    @@ -36,7 +38,13 @@ class SimpleTextSource extends TextBasedFileFormat with DataSourceRegister {
           sparkSession: SparkSession,
           options: Map[String, String],
           files: Seq[FileStatus]): Option[StructType] = {
    -    Some(DataType.fromJson(options("dataSchema")).asInstanceOf[StructType])
    +    val schemaAsString = options("dataSchema")
    +    val schema = try {
    +      DataType.fromJson(schemaAsString)
    +    } catch {
    +      case NonFatal(_) => DataType.fromDdl(schemaAsString)
    +    }
    +    Some(schema.asInstanceOf[StructType])
    --- End diff --
    
    `SimpleTextSource` is just a faked source. What is the reason you made these changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108049886
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,45 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType): Unit = {
    +    test(s"from DDL - $dataType") {
    +      assert(DataType.fromDDL(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    +  }
    +
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +    checkDataTypeFromDDL(dataType)
    +  }
    +
    +  // In some types, check json formats only because the types do not support DDL formats.
    +  checkDataTypeFromJson(NullType)
    +
    +  checkDataTypeFromText(BooleanType)
    +  checkDataTypeFromText(ByteType)
    +  checkDataTypeFromText(ShortType)
    +  checkDataTypeFromText(IntegerType)
    +  checkDataTypeFromText(LongType)
    +  checkDataTypeFromText(FloatType)
    +  checkDataTypeFromText(DoubleType)
    +  checkDataTypeFromText(DecimalType(10, 5))
    +  checkDataTypeFromText(DecimalType.SYSTEM_DEFAULT)
    +  checkDataTypeFromText(DateType)
    +  checkDataTypeFromText(TimestampType)
    +  checkDataTypeFromText(StringType)
    +  checkDataTypeFromText(BinaryType)
    +
    +  checkDataTypeFromText(ArrayType(DoubleType, true))
    +  checkDataTypeFromJson(ArrayType(StringType, false))
    +
    +  checkDataTypeFromText(MapType(IntegerType, StringType, true))
    +  checkDataTypeFromJson(MapType(IntegerType, ArrayType(DoubleType), false))
    --- End diff --
    
    okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    How about only focusing on the `from_json` function in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107837718
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    If this is a new API, why need to keep back-compatibility? For the use of json, there is `fromJson` and I think you don't change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75313/testReport)** for PR 17406 at commit [`8448d7a`](https://github.com/apache/spark/commit/8448d7a7bfa120674b9f9c96a726b04035392bb0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Support DDL strings for defining sche...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    oh, my bad. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75228/testReport)** for PR 17406 at commit [`6e81235`](https://github.com/apache/spark/commit/6e81235ebd55900f4fa43ad929f9b86755b94028).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75202/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108042372
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,45 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType): Unit = {
    +    test(s"from DDL - $dataType") {
    +      assert(DataType.fromDDL(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    +  }
    +
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +    checkDataTypeFromDDL(dataType)
    +  }
    +
    +  // In some types, check json formats only because the types do not support DDL formats.
    --- End diff --
    
    Actually, except `NullType`, all the following data types should be supported. The only issue is nullability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108085204
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates DataType for a given DDL-formatted string, which is a comma separated list of field
    +   * definitions, e.g., a INT, b STRING.
    +   */
    +  def fromDDL(ddl: String): DataType = CatalystSqlParser.parseTableSchema(ddl)
    --- End diff --
    
    Shall we use `StructType` as return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75202/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75252/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75304/testReport)** for PR 17406 at commit [`9ddd515`](https://github.com/apache/spark/commit/9ddd5154846de06e4096b85586836832503f3bb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108051425
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates DataType for a given SQL DDL string, which is a comma separated list of field
    --- End diff --
    
    `SQL DDL string` -> `DDL-formatted string`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107848745
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    I mean users should use `fromString` just for DDL formats.
    
    It is pretty weird that `from_json` is used to parse both json and DDL format.
    
    How about add a new `from_ddl` or `from_string`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75227/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75228/testReport)** for PR 17406 at commit [`6e81235`](https://github.com/apache/spark/commit/6e81235ebd55900f4fa43ad929f9b86755b94028).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107851721
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    `DataType.fromString` should only take DDL format. 
    `DataType.fromJson` should only take json.
    
    I don't think `functions.from_json` should accept both DDL format and json like current change does.
    
    Maybe we can add a new `from_ddl` or `from_string` using `DataType.fromString`, like `from_json` using `DataType.fromJson`?
    
    That is basically my point here. :-)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75202/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Could you update the PR title?
    ```
    [SPARK-20009][SQL] Support DDL strings for defining schema in functions.from_json
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108083206
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates DataType for a given DDL-formatted string, which is a comma separated list of field
    +   * definitions, e.g., a INT, b STRING.
    --- End diff --
    
    `Creates DataType` -> `Creates StructType`.
    
    Because `parseTableSchema` always returns `StructType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75159/testReport)** for PR 17406 at commit [`fb61535`](https://github.com/apache/spark/commit/fb61535c6f3088fe093e49043308b4a33b65fd51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107849648
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    Aha, I see. WDYT? cc: @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75213/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108049895
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,45 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType): Unit = {
    +    test(s"from DDL - $dataType") {
    +      assert(DataType.fromDDL(s"a ${dataType.sql}") === new StructType().add("a", dataType))
    +    }
    +  }
    +
    +  def checkDataTypeFromText(dataType: DataType): Unit = {
    +    checkDataTypeFromJson(dataType)
    +    checkDataTypeFromDDL(dataType)
    +  }
    +
    +  // In some types, check json formats only because the types do not support DDL formats.
    --- End diff --
    
    yea, I think so. we'd be better to file a JIRA for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    LGTM pending Jenkins. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75304/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75159/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107967957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3060,8 +3061,17 @@ object functions {
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    // Until Spark-2.1, we use json strings for defining schemas here. Since we add an user-friendly
    --- End diff --
    
    `In Spark-2.1, user-provided `schema` has to be in JSON format`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107968742
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3060,8 +3061,17 @@ object functions {
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    // Until Spark-2.1, we use json strings for defining schemas here. Since we add an user-friendly
    +    // API in the DDL parser, we employ DDL formats for the case. To keep back-compatibility,
    +    // we use `fromJson` first, and then try the new API.
    --- End diff --
    
    Please move the new description to `@parm schema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75213/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75141/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107831959
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    +    try { fromJson(text) } catch { case _: Throwable => CatalystSqlParser.parseTableSchema(text) }
    --- End diff --
    
    `Throwable` -> `NonFatal`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75313/testReport)** for PR 17406 at commit [`8448d7a`](https://github.com/apache/spark/commit/8448d7a7bfa120674b9f9c96a726b04035392bb0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75143/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75213/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75207/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108441033
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,76 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType, ignoreNullability: Boolean = false): Unit = {
    --- End diff --
    
    Aha, I got the point. I'll fix now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107852982
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    Oh. Got it. I got confused from the name of`from_json`...
    
    Right. It is used to parse a column of json data...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107968297
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3060,8 +3061,17 @@ object functions {
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    // Until Spark-2.1, we use json strings for defining schemas here. Since we add an user-friendly
    +    // API in the DDL parser, we employ DDL formats for the case. To keep back-compatibility,
    +    // we use `fromJson` first, and then try the new API.
    --- End diff --
    
    How about?
    ```
    In Spark 2.1, the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL format is also supported for schema. 
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75296/testReport)** for PR 17406 at commit [`842ca77`](https://github.com/apache/spark/commit/842ca772cf489ca00b8a16fd428f1e0a234faaab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107851214
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    Sorry for interrupting. Actually, my point is a bit different. I thought `DataType.fromString` should take only the SQL type string with proper documentation unless we are going to deprecate `DataType.fromJson` on this purpose, and both cases should be handled whether within `functions.from_json` or somewhere.
    
    I guess `functions.from_json` refers the data itself is in json format not the schema. So probably, it is fine if we document this difference.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75304/testReport)** for PR 17406 at commit [`9ddd515`](https://github.com/apache/spark/commit/9ddd5154846de06e4096b85586836832503f3bb3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Just submitted a fix. NVM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75231/testReport)** for PR 17406 at commit [`e2d121f`](https://github.com/apache/spark/commit/e2d121f214c515cbe043d7954c367570048944b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75236/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    @gatorsmile okay, please. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Ok. LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75303/testReport)** for PR 17406 at commit [`83f5d6e`](https://github.com/apache/spark/commit/83f5d6efe31e980576d6fc7515afe3e08a443e01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107967167
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3060,8 +3061,17 @@ object functions {
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    // Until Spark-2.1, we use json strings for defining schemas here. Since we add an user-friendly
    --- End diff --
    
    `an user-friendly` -> `a user-friendly`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107853315
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    If we are going to support both json schema string and DDL format in `from_json`, shall we add a `DataType.fromStringOrJson` that does exactly the current `DataType.fromString` does? And let `DataType.fromString` only accepts DDL format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107964919
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,8 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  def fromDdl(ddl: String): DataType = CatalystSqlParser.parseTableSchema(ddl)
    --- End diff --
    
    `fromDdl` -> `fromDDL` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107852542
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +106,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    --- End diff --
    
    Yup, to cut my comment even shorter, for me,
    
    +1 only for ...
    
     >`DataType.fromString` should only take DDL format.
    > `DataType.fromJson` should only take json.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75143/testReport)** for PR 17406 at commit [`2afa6f3`](https://github.com/apache/spark/commit/2afa6f3ebd97453c2813f50fa433f9a1d027c554).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107965254
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,8 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  def fromDdl(ddl: String): DataType = CatalystSqlParser.parseTableSchema(ddl)
    --- End diff --
    
    Also please add comments above the function to explain what is the purpose of these functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108337849
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3055,13 +3056,21 @@ object functions {
        * with the specified schema. Returns `null`, in the case of an unparseable string.
        *
        * @param e a string column containing JSON data.
    -   * @param schema the schema to use when parsing the json string as a json string
    +   * @param schema the schema to use when parsing the json string as a json string. In Spark 2.1,
    +   *               the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL
    +   *               format is also supported for the schema.
        *
        * @group collection_funcs
        * @since 2.1.0
        */
    -  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column =
    -    from_json(e, DataType.fromJson(schema), options)
    +  def from_json(e: Column, schema: String, options: java.util.Map[String, String]): Column = {
    +    val dataType = try {
    --- End diff --
    
    A little concern: Won't the error message from parsing json be shadowed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    @gatorsmile could you check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75196/testReport)** for PR 17406 at commit [`ec5452f`](https://github.com/apache/spark/commit/ec5452fb6a3c97a05d29c10b2f843fc96570e092).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    The issue fixed in https://github.com/apache/spark/commit/a2ce0a2e309e70d74ae5d2ed203f7919a0f79397, so I'll retest again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75252/testReport)** for PR 17406 at commit [`842ca77`](https://github.com/apache/spark/commit/842ca772cf489ca00b8a16fd428f1e0a234faaab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75143/testReport)** for PR 17406 at commit [`2afa6f3`](https://github.com/apache/spark/commit/2afa6f3ebd97453c2813f50fa433f9a1d027c554).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108349691
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates StructType for a given DDL-formatted string, which is a comma separated list of field
    +   * definitions, e.g., a INT, b STRING.
    +   */
    +  def fromDDL(ddl: String): StructType = CatalystSqlParser.parseTableSchema(ddl)
    --- End diff --
    
    okay, I'll do soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    okay, I'll update soon! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108434080
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -169,30 +169,76 @@ class DataTypeSuite extends SparkFunSuite {
         assert(!arrayType.existsRecursively(_.isInstanceOf[IntegerType]))
       }
     
    -  def checkDataTypeJsonRepr(dataType: DataType): Unit = {
    -    test(s"JSON - $dataType") {
    +  def checkDataTypeFromJson(dataType: DataType): Unit = {
    +    test(s"from Json - $dataType") {
           assert(DataType.fromJson(dataType.json) === dataType)
         }
       }
     
    -  checkDataTypeJsonRepr(NullType)
    -  checkDataTypeJsonRepr(BooleanType)
    -  checkDataTypeJsonRepr(ByteType)
    -  checkDataTypeJsonRepr(ShortType)
    -  checkDataTypeJsonRepr(IntegerType)
    -  checkDataTypeJsonRepr(LongType)
    -  checkDataTypeJsonRepr(FloatType)
    -  checkDataTypeJsonRepr(DoubleType)
    -  checkDataTypeJsonRepr(DecimalType(10, 5))
    -  checkDataTypeJsonRepr(DecimalType.SYSTEM_DEFAULT)
    -  checkDataTypeJsonRepr(DateType)
    -  checkDataTypeJsonRepr(TimestampType)
    -  checkDataTypeJsonRepr(StringType)
    -  checkDataTypeJsonRepr(BinaryType)
    -  checkDataTypeJsonRepr(ArrayType(DoubleType, true))
    -  checkDataTypeJsonRepr(ArrayType(StringType, false))
    -  checkDataTypeJsonRepr(MapType(IntegerType, StringType, true))
    -  checkDataTypeJsonRepr(MapType(IntegerType, ArrayType(DoubleType), false))
    +  def checkDataTypeFromDDL(dataType: DataType, ignoreNullability: Boolean = false): Unit = {
    --- End diff --
    
    why do we have the `ignoreNullability` parameter? `DataType.sql` doesn't contains nullability information so we should not expect to retain the nullability here. We should always compare the types by `sameType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r107831993
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,13 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  // Until Spark-2.1, we use json strings for defining schemas in user-facing APIs.
    +  // Since we add an user-friendly API in the DDL parser, we employ DDL formats for the case.
    +  // To keep back-compatibility, we use `fromJson` first, and then try the new API.
    +  def fromString(text: String): DataType = {
    +    try { fromJson(text) } catch { case _: Throwable => CatalystSqlParser.parseTableSchema(text) }
    --- End diff --
    
    okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75196/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75231/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108087790
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
    @@ -103,6 +104,12 @@ object DataType {
     
       def fromJson(json: String): DataType = parseDataType(parse(json))
     
    +  /**
    +   * Creates DataType for a given DDL-formatted string, which is a comma separated list of field
    +   * definitions, e.g., a INT, b STRING.
    --- End diff --
    
    Thanks comments! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    @maropu Could you update the PR title? Then, I can merge it once it is done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Support DDL strings for defining sche...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75207/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75296/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75303/testReport)** for PR 17406 at commit [`83f5d6e`](https://github.com/apache/spark/commit/83f5d6efe31e980576d6fc7515afe3e08a443e01).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17406: [SPARK-20009][SQL] Use DDL strings for defining s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17406#discussion_r108042419
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
    @@ -201,7 +216,7 @@ class DataTypeSuite extends SparkFunSuite {
         StructField("a", IntegerType, nullable = true),
         StructField("b", ArrayType(DoubleType), nullable = false),
         StructField("c", DoubleType, nullable = false, metadata)))
    -  checkDataTypeJsonRepr(structType)
    +  checkDataTypeFromJson(structType)
    --- End diff --
    
    The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17406: [SPARK-20009][SQL] Use DDL strings for defining schema i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17406
  
    **[Test build #75236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75236/testReport)** for PR 17406 at commit [`e2d121f`](https://github.com/apache/spark/commit/e2d121f214c515cbe043d7954c367570048944b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org