You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2018/01/16 19:38:12 UTC

[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/20280

    [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include __from_dict__ flag

    ## What changes were proposed in this pull request?
    
    When a `Row` object is created using kwargs, the order of the keywords can not be relied upon  (except for Python 3.5 that uses an OrderedDict).  The fields are sorted in the constructor and a flag `__from_dict__` is set to indicate that this object was created from kwargs so that other areas in Spark can access row data using field names instead of by position.  This change includes the `__from_dict__` flag only when pickling a Row that was made from kwargs so that the behavior is preserved if the Row becomes pickled.
    
    ## How was this patch tested?
    
    Fixed existing tests that relied on fields and schema being in the same alphabetical order.  Added new test to create `Row` from positional arguments where order matters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark pyspark-Row-serialize-SPARK-22232

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20280
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    Thanks @HyukjinKwon and @felixcheung , I'm a bit worried too that this might break someones code, but it doesn't affect `createDataFrame` from `Row`s, it's only when the Row is serialized like going from an RDD of Rows `toDF`.  Even then the schema gets alphabetized, which I'm sure the users would agree that it is strange.
    
    I'm not sure about adding a config switch, it might be a little hard to add and could be confusing to the user to explain that its only when serialized and the schema would need to be sorted by the original Row keywords.
    
    I'll go ahead and update the migration guide, and expand on the PR description to hopefully make the change as clear as possible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86205/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    **[Test build #89530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)** for PR 20280 at commit [`10bf2d0`](https://github.com/apache/spark/commit/10bf2d094b29b4e8ef7a38693f3956f96c0e9f7e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    closing now, will revisit for Spark 3.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    Probably that'd work but also it'd be trickier to add / remove that configuration. Another similar option maybe just close this for now and target this for 3.0.0 since we already started to talk about it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    holding off is fine; however, I am less sure about the configuration if that's not something you guys feel strongly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    @HyukjinKwon , I was also thinking about holding off on this until 3.0.0 and then make a clean switch.  What do you think about that @holdenk ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    **[Test build #89530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)** for PR 20280 at commit [`10bf2d0`](https://github.com/apache/spark/commit/10bf2d094b29b4e8ef7a38693f3956f96c0e9f7e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    **[Test build #86193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)** for PR 20280 at commit [`315b8de`](https://github.com/apache/spark/commit/315b8de0fb3e7277b895b98769e52da7aaae32d6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86204/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    I'm kinda worry the example you give above is actually fairly common - construct with kwargs, and then (re-)name the columns.
    
    perhaps worthwhile to consider a config switch?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org