You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2014/09/28 15:05:56 UTC

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2563

    [SPARK-3713][SQL] Uses JSON to serialize DataType objects

    This PR uses JSON instead of `toString` to serialize `DataType`s. The latter is not only hard to parse but also flaky in many cases.
    
    Since we already write schema information to Parquet metadata in the old style, we have to reserve the old `DataType` parser and ensure downward compatibility. The old parser is now renamed to `CaseClassStringParser` and moved into `object DataType`.
    
    @JoshRosen @davis Please help review PySpark related changes, thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark datatype-to-json

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2563
    
----
commit dca9153d213a9a9603d7b327d78750af66021ed2
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-25T09:28:06Z

    De/serializes DataType objects from/to JSON

commit 5f792df158128f6bf41a49e816a915150698a9d2
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-28T11:19:34Z

    Adds PySpark support

commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-28T11:54:26Z

    Adds compatibility est case for Parquet type conversion

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57648680
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21205/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57732079
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21227/consoleFull) for   PR 2563 at commit [`81e28fb`](https://github.com/apache/spark/commit/81e28fbf89d65202d8b934a9d98c3c60fce2e2a2).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57090932
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20939/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58299048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21437/Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58299045
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21437/consoleFull) for   PR 2563 at commit [`fc92eb3`](https://github.com/apache/spark/commit/fc92eb3ad82c998d5f0ea4e94d730a6e90185d9e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57084959
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20938/consoleFull) for   PR 2563 at commit [`26c6563`](https://github.com/apache/spark/commit/26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57899624
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21289/Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18325514
  
    --- Diff: python/pyspark/sql.py ---
    @@ -205,6 +234,16 @@ def __str__(self):
             return "ArrayType(%s,%s)" % (self.elementType,
                                          str(self.containsNull).lower())
     
    +    simpleString = 'array'
    +
    +    def jsonValue(self):
    +        return {
    +            self.simpleString: {
    +                'type': self.elementType.jsonValue(),
    +                'containsNull': self.containsNull
    +            }
    +        }
    --- End diff --
    
    I'd like this one:
    ```
    {self.simpleString: {'type': self.elementType.jsonValue(),
                                   'containsNull': self.containsNull}}}
    ```
    it will be better if it has one layer:
    ```
    {'type': self.simpleString, 
     'type': self.elementType.jsonValue(), 
     'containsNull': self.containsNull}
    ```
    
    I prefer fewer lines personally, then I can read more codes in one screen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57084987
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20938/consoleFull) for   PR 2563 at commit [`26c6563`](https://github.com/apache/spark/commit/26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57898948
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21291/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57544564
  
    Minor comment otherwise this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18321352
  
    --- Diff: python/pyspark/sql.py ---
    @@ -205,6 +234,16 @@ def __str__(self):
             return "ArrayType(%s,%s)" % (self.elementType,
                                          str(self.containsNull).lower())
     
    +    simpleString = 'array'
    +
    +    def jsonValue(self):
    +        return {
    +            self.simpleString: {
    +                'type': self.elementType.jsonValue(),
    +                'containsNull': self.containsNull
    +            }
    +        }
    --- End diff --
    
    This looks like js style, it could be fit in fewer lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58307821
  
    @marmbrus I think this is ready to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57732086
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21227/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58234335
  
    Could you rebase this to master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57897375
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21289/consoleFull) for   PR 2563 at commit [`54c46ce`](https://github.com/apache/spark/commit/54c46ce607c521df4bea390d3cac7d42a6f006f8).
     * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57898946
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21291/consoleFull) for   PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18383911
  
    --- Diff: python/pyspark/sql.py ---
    @@ -62,6 +67,17 @@ def __eq__(self, other):
         def __ne__(self, other):
             return not self.__eq__(other)
     
    +    def simpleString(self):
    +        return _get_simple_string(self.__class__)
    --- End diff --
    
    why not just put _get_simple_string here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57926775
  
    LGTM now, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57722826
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21227/consoleFull) for   PR 2563 at commit [`81e28fb`](https://github.com/apache/spark/commit/81e28fbf89d65202d8b934a9d98c3c60fce2e2a2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18345824
  
    --- Diff: python/pyspark/sql.py ---
    @@ -385,50 +429,32 @@ def _parse_datatype_string(datatype_string):
         >>> check_datatype(complex_maptype)
         True
         """
    -    index = datatype_string.find("(")
    -    if index == -1:
    -        # It is a primitive type.
    -        index = len(datatype_string)
    -    type_or_field = datatype_string[:index]
    -    rest_part = datatype_string[index + 1:len(datatype_string) - 1].strip()
    -
    -    if type_or_field in _all_primitive_types:
    -        return _all_primitive_types[type_or_field]()
    -
    -    elif type_or_field == "ArrayType":
    -        last_comma_index = rest_part.rfind(",")
    -        containsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            containsNull = False
    -        elementType = _parse_datatype_string(
    -            rest_part[:last_comma_index].strip())
    -        return ArrayType(elementType, containsNull)
    -
    -    elif type_or_field == "MapType":
    -        last_comma_index = rest_part.rfind(",")
    -        valueContainsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            valueContainsNull = False
    -        keyType, valueType = _parse_datatype_list(
    -            rest_part[:last_comma_index].strip())
    -        return MapType(keyType, valueType, valueContainsNull)
    -
    -    elif type_or_field == "StructField":
    -        first_comma_index = rest_part.find(",")
    -        name = rest_part[:first_comma_index].strip()
    -        last_comma_index = rest_part.rfind(",")
    -        nullable = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            nullable = False
    -        dataType = _parse_datatype_string(
    -            rest_part[first_comma_index + 1:last_comma_index].strip())
    -        return StructField(name, dataType, nullable)
    -
    -    elif type_or_field == "StructType":
    -        # rest_part should be in the format like
    -        # List(StructField(field1,IntegerType,false)).
    -        field_list_string = rest_part[rest_part.find("(") + 1:-1]
    -        fields = _parse_datatype_list(field_list_string)
    +    return _parse_datatype_json_value(json.loads(json_string))
    +
    +
    +def _parse_datatype_json_value(json_value):
    +    if json_value in _all_primitive_types.keys():
    --- End diff --
    
    Thanks for the unhashable hint. I'd like to make the result JSON string as compact as possible, that's why all primitive types are serialized to a single string. Then I'll add a type check here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57087294
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull) for   PR 2563 at commit [`03da3ec`](https://github.com/apache/spark/commit/03da3ec870940bd6ff56e03450993da6125b40a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2563


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57084988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20938/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18321520
  
    --- Diff: python/pyspark/sql.py ---
    @@ -385,50 +429,32 @@ def _parse_datatype_string(datatype_string):
         >>> check_datatype(complex_maptype)
         True
         """
    -    index = datatype_string.find("(")
    -    if index == -1:
    -        # It is a primitive type.
    -        index = len(datatype_string)
    -    type_or_field = datatype_string[:index]
    -    rest_part = datatype_string[index + 1:len(datatype_string) - 1].strip()
    -
    -    if type_or_field in _all_primitive_types:
    -        return _all_primitive_types[type_or_field]()
    -
    -    elif type_or_field == "ArrayType":
    -        last_comma_index = rest_part.rfind(",")
    -        containsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            containsNull = False
    -        elementType = _parse_datatype_string(
    -            rest_part[:last_comma_index].strip())
    -        return ArrayType(elementType, containsNull)
    -
    -    elif type_or_field == "MapType":
    -        last_comma_index = rest_part.rfind(",")
    -        valueContainsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            valueContainsNull = False
    -        keyType, valueType = _parse_datatype_list(
    -            rest_part[:last_comma_index].strip())
    -        return MapType(keyType, valueType, valueContainsNull)
    -
    -    elif type_or_field == "StructField":
    -        first_comma_index = rest_part.find(",")
    -        name = rest_part[:first_comma_index].strip()
    -        last_comma_index = rest_part.rfind(",")
    -        nullable = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            nullable = False
    -        dataType = _parse_datatype_string(
    -            rest_part[first_comma_index + 1:last_comma_index].strip())
    -        return StructField(name, dataType, nullable)
    -
    -    elif type_or_field == "StructType":
    -        # rest_part should be in the format like
    -        # List(StructField(field1,IntegerType,false)).
    -        field_list_string = rest_part[rest_part.find("(") + 1:-1]
    -        fields = _parse_datatype_list(field_list_string)
    +    return _parse_datatype_json_value(json.loads(json_string))
    +
    +
    +def _parse_datatype_json_value(json_value):
    +    if json_value in _all_primitive_types.keys():
    --- End diff --
    
    if json_value is {}, it's not hashable, you can not use 'in' for it.
    
    I would like to use same type of json_value for all types, such as dict, with a key called type, such as:
    
    ```
    {'type': 'int'}
    ``` 
    for other types, it could have additional keys, based on the type, such as:
    
    ```
    {'type':'array', 'element': {'type':'int'}, 'null': True}
    ```
    
    In this ways, it will be easier to do the type switch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57272666
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/210/consoleFull) for   PR 2563 at commit [`03da3ec`](https://github.com/apache/spark/commit/03da3ec870940bd6ff56e03450993da6125b40a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18321283
  
    --- Diff: python/pyspark/sql.py ---
    @@ -62,6 +63,12 @@ def __eq__(self, other):
         def __ne__(self, other):
             return not self.__eq__(other)
     
    +    def jsonValue(self):
    +        return self.simpleString
    --- End diff --
    
    you can have default implementation as:
    
    self.__class__.__name__.[:-4].lower()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58300431
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/285/consoleFull) for   PR 2563 at commit [`fc92eb3`](https://github.com/apache/spark/commit/fc92eb3ad82c998d5f0ea4e94d730a6e90185d9e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(inputFile: String = null, threshold: Double = 0.1)`
      * `class Word2VecModel(object):`
      * `class Word2Vec(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57925711
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21305/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18432669
  
    --- Diff: python/run-tests ---
    @@ -60,56 +60,58 @@ fi
     echo "Testing with Python version:"
     $PYSPARK_PYTHON --version
     
    -run_test "pyspark/rdd.py"
    -run_test "pyspark/context.py"
    -run_test "pyspark/conf.py"
     run_test "pyspark/sql.py"
    -# These tests are included in the module-level docs, and so must
    --- End diff --
    
    you can setup path in bashrc:
    ```
    export SPARK_HOME=path_to_spark
    export PYTHONPATH=${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip
    ```
    then you could run any pyspark jobs directly with python (or run single test)
    ```
    python python/pyspark/sql.py
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18377127
  
    --- Diff: python/pyspark/sql.py ---
    @@ -62,6 +63,12 @@ def __eq__(self, other):
         def __ne__(self, other):
             return not self.__eq__(other)
     
    +    def jsonValue(self):
    +        return self.simpleString
    --- End diff --
    
    Thanks for this, saved lots of boilerplate code! Removed all `simpleString()` method in subclasses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57924412
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21305/consoleFull) for   PR 2563 at commit [`de18dea`](https://github.com/apache/spark/commit/de18dead6077327e8870841a6194894ba51b5b9f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18399479
  
    --- Diff: python/pyspark/sql.py ---
    @@ -312,42 +343,24 @@ def __repr__(self):
             return ("StructType(List(%s))" %
                     ",".join(str(field) for field in self.fields))
     
    +    def jsonValue(self):
    +        return {self.simpleString():
    +                {'fields': map(lambda f: f.jsonValue(), self.fields)}}
     
    -def _parse_datatype_list(datatype_list_string):
    -    """Parses a list of comma separated data types."""
    -    index = 0
    -    datatype_list = []
    -    start = 0
    -    depth = 0
    -    while index < len(datatype_list_string):
    -        if depth == 0 and datatype_list_string[index] == ",":
    -            datatype_string = datatype_list_string[start:index].strip()
    -            datatype_list.append(_parse_datatype_string(datatype_string))
    -            start = index + 1
    -        elif datatype_list_string[index] == "(":
    -            depth += 1
    -        elif datatype_list_string[index] == ")":
    -            depth -= 1
    -
    -        index += 1
    -
    -    # Handle the last data type
    -    datatype_string = datatype_list_string[start:index].strip()
    -    datatype_list.append(_parse_datatype_string(datatype_string))
    -    return datatype_list
     
    +_all_primitive_types = dict((_get_simple_string(v), v)
    +                            for v in globals().itervalues()
    --- End diff --
    
    `simpleString` is the name chosen in the Scala code, I agree that `typeName` is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58446924
  
    Thanks! I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57923419
  
    @davies Sorry for my carelessness... And thanks again for all the great advices!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57922847
  
    @liancheng You had mentioned another guy, my id is davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18342925
  
    --- Diff: python/pyspark/sql.py ---
    @@ -205,6 +234,16 @@ def __str__(self):
             return "ArrayType(%s,%s)" % (self.elementType,
                                          str(self.containsNull).lower())
     
    +    simpleString = 'array'
    +
    +    def jsonValue(self):
    +        return {
    +            self.simpleString: {
    +                'type': self.elementType.jsonValue(),
    +                'containsNull': self.containsNull
    +            }
    +        }
    --- End diff --
    
    Thanks, I like this style :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18384024
  
    --- Diff: python/pyspark/sql.py ---
    @@ -312,42 +343,24 @@ def __repr__(self):
             return ("StructType(List(%s))" %
                     ",".join(str(field) for field in self.fields))
     
    +    def jsonValue(self):
    +        return {self.simpleString():
    +                {'fields': map(lambda f: f.jsonValue(), self.fields)}}
     
    -def _parse_datatype_list(datatype_list_string):
    -    """Parses a list of comma separated data types."""
    -    index = 0
    -    datatype_list = []
    -    start = 0
    -    depth = 0
    -    while index < len(datatype_list_string):
    -        if depth == 0 and datatype_list_string[index] == ",":
    -            datatype_string = datatype_list_string[start:index].strip()
    -            datatype_list.append(_parse_datatype_string(datatype_string))
    -            start = index + 1
    -        elif datatype_list_string[index] == "(":
    -            depth += 1
    -        elif datatype_list_string[index] == ")":
    -            depth -= 1
    -
    -        index += 1
    -
    -    # Handle the last data type
    -    datatype_string = datatype_list_string[start:index].strip()
    -    datatype_list.append(_parse_datatype_string(datatype_string))
    -    return datatype_list
     
    +_all_primitive_types = dict((_get_simple_string(v), v)
    +                            for v in globals().itervalues()
    --- End diff --
    
    it's better to call v.simpleString(), maybe call it `typeName` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57897538
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21291/consoleFull) for   PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57090930
  
    **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull)** after     a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57923097
  
    This looks good to me, you just forget to rollback the changes in run-tests after debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57900055
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/265/consoleFull) for   PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57925709
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21305/consoleFull) for   PR 2563 at commit [`de18dea`](https://github.com/apache/spark/commit/de18dead6077327e8870841a6194894ba51b5b9f).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18432681
  
    --- Diff: python/pyspark/sql.py ---
    @@ -62,6 +63,18 @@ def __eq__(self, other):
         def __ne__(self, other):
             return not self.__eq__(other)
     
    +    @classmethod
    +    def typeName(cls):
    +        return cls.__name__[:-4].lower()
    +
    +    def jsonValue(self):
    +        return {"type": self.typeName()}
    --- End diff --
    
    If you like to use single string for Primitive types, it's still doable, only use one layer dict for others.
    
    Either one is good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18384157
  
    --- Diff: python/pyspark/sql.py ---
    @@ -385,51 +398,35 @@ def _parse_datatype_string(datatype_string):
         >>> check_datatype(complex_maptype)
         True
         """
    -    index = datatype_string.find("(")
    -    if index == -1:
    -        # It is a primitive type.
    -        index = len(datatype_string)
    -    type_or_field = datatype_string[:index]
    -    rest_part = datatype_string[index + 1:len(datatype_string) - 1].strip()
    -
    -    if type_or_field in _all_primitive_types:
    -        return _all_primitive_types[type_or_field]()
    -
    -    elif type_or_field == "ArrayType":
    -        last_comma_index = rest_part.rfind(",")
    -        containsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            containsNull = False
    -        elementType = _parse_datatype_string(
    -            rest_part[:last_comma_index].strip())
    -        return ArrayType(elementType, containsNull)
    -
    -    elif type_or_field == "MapType":
    -        last_comma_index = rest_part.rfind(",")
    -        valueContainsNull = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            valueContainsNull = False
    -        keyType, valueType = _parse_datatype_list(
    -            rest_part[:last_comma_index].strip())
    -        return MapType(keyType, valueType, valueContainsNull)
    -
    -    elif type_or_field == "StructField":
    -        first_comma_index = rest_part.find(",")
    -        name = rest_part[:first_comma_index].strip()
    -        last_comma_index = rest_part.rfind(",")
    -        nullable = True
    -        if rest_part[last_comma_index + 1:].strip().lower() == "false":
    -            nullable = False
    -        dataType = _parse_datatype_string(
    -            rest_part[first_comma_index + 1:last_comma_index].strip())
    -        return StructField(name, dataType, nullable)
    -
    -    elif type_or_field == "StructType":
    -        # rest_part should be in the format like
    -        # List(StructField(field1,IntegerType,false)).
    -        field_list_string = rest_part[rest_part.find("(") + 1:-1]
    -        fields = _parse_datatype_list(field_list_string)
    +    return _parse_datatype_json_value(json.loads(json_string))
    +
    +
    +def _parse_datatype_json_value(json_value):
    +    if type(json_value) is unicode and json_value in _all_primitive_types.keys():
    +        return _all_primitive_types[json_value]()
    +    elif 'array' in json_value:
    +        array_type = json_value['array']
    +        element_type = _parse_datatype_json_value(array_type['type'])
    +        contains_null = array_type['containsNull']
    +        return ArrayType(element_type, contains_null)
    --- End diff --
    
    If the jsonValue has one level, then these lines can be written like this:
    ```
    if json_value['type'] == 'array':
      return ArrayType(json_value['element'], json_value[''containsNull''])
    ```
    Also, it will be much easier to to like this:
    
    ```
    class ArrayType:
       @classmethod
       def load_from_json(cls, json):
            return ArrayType(json['element'], json[''containsNull''])
    
    types[json_value['type']].load_from_json(json_value)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18310445
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -19,71 +19,127 @@ package org.apache.spark.sql.catalyst.types
     
     import java.sql.Timestamp
     
    -import scala.math.Numeric.{FloatAsIfIntegral, BigDecimalAsIfIntegral, DoubleAsIfIntegral}
    +import scala.math.Numeric.{BigDecimalAsIfIntegral, DoubleAsIfIntegral, FloatAsIfIntegral}
     import scala.reflect.ClassTag
    -import scala.reflect.runtime.universe.{typeTag, TypeTag, runtimeMirror}
    +import scala.reflect.runtime.universe.{TypeTag, runtimeMirror, typeTag}
     import scala.util.parsing.combinator.RegexParsers
     
    +import org.json4s.JsonAST.JValue
    +import org.json4s._
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
     import org.apache.spark.sql.catalyst.ScalaReflectionLock
     import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Expression}
     import org.apache.spark.util.Utils
     
    -/**
    - * Utility functions for working with DataTypes.
    - */
    -object DataType extends RegexParsers {
    -  protected lazy val primitiveType: Parser[DataType] =
    -    "StringType" ^^^ StringType |
    -    "FloatType" ^^^ FloatType |
    -    "IntegerType" ^^^ IntegerType |
    -    "ByteType" ^^^ ByteType |
    -    "ShortType" ^^^ ShortType |
    -    "DoubleType" ^^^ DoubleType |
    -    "LongType" ^^^ LongType |
    -    "BinaryType" ^^^ BinaryType |
    -    "BooleanType" ^^^ BooleanType |
    -    "DecimalType" ^^^ DecimalType |
    -    "TimestampType" ^^^ TimestampType
    -
    -  protected lazy val arrayType: Parser[DataType] =
    -    "ArrayType" ~> "(" ~> dataType ~ "," ~ boolVal <~ ")" ^^ {
    -      case tpe ~ _ ~ containsNull => ArrayType(tpe, containsNull)
    -    }
     
    -  protected lazy val mapType: Parser[DataType] =
    -    "MapType" ~> "(" ~> dataType ~ "," ~ dataType ~ "," ~ boolVal <~ ")" ^^ {
    -      case t1 ~ _ ~ t2 ~ _ ~ valueContainsNull => MapType(t1, t2, valueContainsNull)
    -    }
    +object DataType {
    +  def fromJson(json: String): DataType = parseDataType(parse(json))
     
    -  protected lazy val structField: Parser[StructField] =
    -    ("StructField(" ~> "[a-zA-Z0-9_]*".r) ~ ("," ~> dataType) ~ ("," ~> boolVal <~ ")") ^^ {
    -      case name ~ tpe ~ nullable  =>
    -          StructField(name, tpe, nullable = nullable)
    +  private object JSortedObject {
    +    def unapplySeq(value: JValue): Option[List[(String, JValue)]] = value match {
    +      case JObject(seq) => Some(seq.toList.sortBy(_._1))
    +      case _ => None
         }
    +  }
     
    -  protected lazy val boolVal: Parser[Boolean] =
    -    "true" ^^^ true |
    -    "false" ^^^ false
    +  private def parseDataType(asJValue: JValue): DataType = asJValue match {
    +    case JString("boolean") => BooleanType
    +    case JString("byte") => ByteType
    +    case JString("short") => ShortType
    +    case JString("integer") => IntegerType
    +    case JString("long") => LongType
    +    case JString("float") => FloatType
    +    case JString("double") => DoubleType
    +    case JString("decimal") => DecimalType
    +    case JString("string") => StringType
    +    case JString("binary") => BinaryType
    +    case JString("timestamp") => TimestampType
    +    case JString("null") => NullType
    +    case JObject(List(("array", JSortedObject(
    +        ("containsNull", JBool(n)), ("type", t: JValue))))) =>
    +      ArrayType(parseDataType(t), n)
    +    case JObject(List(("struct", JObject(List(("fields", JArray(fields))))))) =>
    +      StructType(fields.map(parseStructField))
    +    case JObject(List(("map", JSortedObject(
    +        ("key", k: JValue), ("value", v: JValue), ("valueContainsNull", JBool(n)))))) =>
    +      MapType(parseDataType(k), parseDataType(v), n)
    +  }
     
    -  protected lazy val structType: Parser[DataType] =
    -    "StructType\\([A-zA-z]*\\(".r ~> repsep(structField, ",") <~ "))" ^^ {
    -      case fields => new StructType(fields)
    -    }
    +  private def parseStructField(asJValue: JValue): StructField = asJValue match {
    +    case JObject(Seq(("field", JSortedObject(
    +        ("name", JString(name)),
    +        ("nullable", JBool(nullable)),
    +        ("type", dataType: JValue))))) =>
    +      StructField(name, parseDataType(dataType), nullable)
    +  }
     
    -  protected lazy val dataType: Parser[DataType] =
    -    arrayType |
    -      mapType |
    -      structType |
    -      primitiveType
    +  @deprecated("Use DataType.fromJson instead")
    +  def fromCaseClassString(string: String): DataType = CaseClassStringParser(string)
     
       /**
    -   * Parses a string representation of a DataType.
    -   *
    -   * TODO: Generate parser as pickler...
    +   * Utility functions for working with DataTypes.
    --- End diff --
    
    I think this comment is in the wrong place.  We should probably note that this parser is deprecated and is only here for backwards compatibility.  We might even print a warning when it is used so we can get rid of it eventually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57648676
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21205/consoleFull) for   PR 2563 at commit [`5169238`](https://github.com/apache/spark/commit/51692385ea7c9cde75e37adab776e71d16e26ff3).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58292948
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/285/consoleFull) for   PR 2563 at commit [`fc92eb3`](https://github.com/apache/spark/commit/fc92eb3ad82c998d5f0ea4e94d730a6e90185d9e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57901579
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/265/consoleFull) for   PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58292425
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21437/consoleFull) for   PR 2563 at commit [`fc92eb3`](https://github.com/apache/spark/commit/fc92eb3ad82c998d5f0ea4e94d730a6e90185d9e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18324999
  
    --- Diff: python/pyspark/sql.py ---
    @@ -205,6 +234,16 @@ def __str__(self):
             return "ArrayType(%s,%s)" % (self.elementType,
                                          str(self.containsNull).lower())
     
    +    simpleString = 'array'
    +
    +    def jsonValue(self):
    +        return {
    +            self.simpleString: {
    +                'type': self.elementType.jsonValue(),
    +                'containsNull': self.containsNull
    +            }
    +        }
    --- End diff --
    
    Any suggestions about indenting and wrapping complex nested Python data structure like this? I checked PEP8 while adding these lines, but didn't find useful guidelines for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18342875
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -19,71 +19,127 @@ package org.apache.spark.sql.catalyst.types
     
     import java.sql.Timestamp
     
    -import scala.math.Numeric.{FloatAsIfIntegral, BigDecimalAsIfIntegral, DoubleAsIfIntegral}
    +import scala.math.Numeric.{BigDecimalAsIfIntegral, DoubleAsIfIntegral, FloatAsIfIntegral}
     import scala.reflect.ClassTag
    -import scala.reflect.runtime.universe.{typeTag, TypeTag, runtimeMirror}
    +import scala.reflect.runtime.universe.{TypeTag, runtimeMirror, typeTag}
     import scala.util.parsing.combinator.RegexParsers
     
    +import org.json4s.JsonAST.JValue
    +import org.json4s._
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
     import org.apache.spark.sql.catalyst.ScalaReflectionLock
     import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Expression}
     import org.apache.spark.util.Utils
     
    -/**
    - * Utility functions for working with DataTypes.
    - */
    -object DataType extends RegexParsers {
    -  protected lazy val primitiveType: Parser[DataType] =
    -    "StringType" ^^^ StringType |
    -    "FloatType" ^^^ FloatType |
    -    "IntegerType" ^^^ IntegerType |
    -    "ByteType" ^^^ ByteType |
    -    "ShortType" ^^^ ShortType |
    -    "DoubleType" ^^^ DoubleType |
    -    "LongType" ^^^ LongType |
    -    "BinaryType" ^^^ BinaryType |
    -    "BooleanType" ^^^ BooleanType |
    -    "DecimalType" ^^^ DecimalType |
    -    "TimestampType" ^^^ TimestampType
    -
    -  protected lazy val arrayType: Parser[DataType] =
    -    "ArrayType" ~> "(" ~> dataType ~ "," ~ boolVal <~ ")" ^^ {
    -      case tpe ~ _ ~ containsNull => ArrayType(tpe, containsNull)
    -    }
     
    -  protected lazy val mapType: Parser[DataType] =
    -    "MapType" ~> "(" ~> dataType ~ "," ~ dataType ~ "," ~ boolVal <~ ")" ^^ {
    -      case t1 ~ _ ~ t2 ~ _ ~ valueContainsNull => MapType(t1, t2, valueContainsNull)
    -    }
    +object DataType {
    +  def fromJson(json: String): DataType = parseDataType(parse(json))
     
    -  protected lazy val structField: Parser[StructField] =
    -    ("StructField(" ~> "[a-zA-Z0-9_]*".r) ~ ("," ~> dataType) ~ ("," ~> boolVal <~ ")") ^^ {
    -      case name ~ tpe ~ nullable  =>
    -          StructField(name, tpe, nullable = nullable)
    +  private object JSortedObject {
    +    def unapplySeq(value: JValue): Option[List[(String, JValue)]] = value match {
    +      case JObject(seq) => Some(seq.toList.sortBy(_._1))
    +      case _ => None
         }
    +  }
     
    -  protected lazy val boolVal: Parser[Boolean] =
    -    "true" ^^^ true |
    -    "false" ^^^ false
    +  private def parseDataType(asJValue: JValue): DataType = asJValue match {
    +    case JString("boolean") => BooleanType
    +    case JString("byte") => ByteType
    +    case JString("short") => ShortType
    +    case JString("integer") => IntegerType
    +    case JString("long") => LongType
    +    case JString("float") => FloatType
    +    case JString("double") => DoubleType
    +    case JString("decimal") => DecimalType
    +    case JString("string") => StringType
    +    case JString("binary") => BinaryType
    +    case JString("timestamp") => TimestampType
    +    case JString("null") => NullType
    +    case JObject(List(("array", JSortedObject(
    +        ("containsNull", JBool(n)), ("type", t: JValue))))) =>
    +      ArrayType(parseDataType(t), n)
    +    case JObject(List(("struct", JObject(List(("fields", JArray(fields))))))) =>
    +      StructType(fields.map(parseStructField))
    +    case JObject(List(("map", JSortedObject(
    +        ("key", k: JValue), ("value", v: JValue), ("valueContainsNull", JBool(n)))))) =>
    +      MapType(parseDataType(k), parseDataType(v), n)
    +  }
     
    -  protected lazy val structType: Parser[DataType] =
    -    "StructType\\([A-zA-z]*\\(".r ~> repsep(structField, ",") <~ "))" ^^ {
    -      case fields => new StructType(fields)
    -    }
    +  private def parseStructField(asJValue: JValue): StructField = asJValue match {
    +    case JObject(Seq(("field", JSortedObject(
    +        ("name", JString(name)),
    +        ("nullable", JBool(nullable)),
    +        ("type", dataType: JValue))))) =>
    +      StructField(name, parseDataType(dataType), nullable)
    +  }
     
    -  protected lazy val dataType: Parser[DataType] =
    -    arrayType |
    -      mapType |
    -      structType |
    -      primitiveType
    +  @deprecated("Use DataType.fromJson instead")
    +  def fromCaseClassString(string: String): DataType = CaseClassStringParser(string)
     
       /**
    -   * Parses a string representation of a DataType.
    -   *
    -   * TODO: Generate parser as pickler...
    +   * Utility functions for working with DataTypes.
    --- End diff --
    
    Ah, this comment is a mistake. Instead of print a warning, I made `fromCaseClassString()` private. It's only referenced by `CaseClassStringParser`, which has already been marked as deprecated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57648509
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21205/consoleFull) for   PR 2563 at commit [`5169238`](https://github.com/apache/spark/commit/51692385ea7c9cde75e37adab776e71d16e26ff3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57899622
  
    **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21289/consoleFull)** after     a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57279975
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/210/consoleFull) for   PR 2563 at commit [`03da3ec`](https://github.com/apache/spark/commit/03da3ec870940bd6ff56e03450993da6125b40a4).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-57897337
  
    @davis Thanks for all the suggestions, really makes things a lot cleaner!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2563#issuecomment-58291809
  
    Finished rebasing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2563#discussion_r18432650
  
    --- Diff: python/pyspark/sql.py ---
    @@ -312,42 +358,30 @@ def __repr__(self):
             return ("StructType(List(%s))" %
                     ",".join(str(field) for field in self.fields))
     
    +    def jsonValue(self):
    +        return {"type": self.typeName(),
    +                "fields": map(lambda f: f.jsonValue(), self.fields)}
    --- End diff --
    
    list comprehension is preferred  than map and lambda:
    ```
    [f.jsonValue() for f in self.fields]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org