You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2014/09/12 01:48:28 UTC

[GitHub] spark pull request: [SPARK-2951] support unpickle array.array for ...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/2365

    [SPARK-2951] support unpickle array.array for Python 2.6

    Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite.
    
    There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue.
    
    I had send an PR to Pyrolite to fix it:  https://github.com/irmen/Pyrolite/pull/11  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark pickle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2365
    
----
commit 60e4e2f8e0897555af6e657967bd39c150fc4726
Author: Davies Liu <da...@gmail.com>
Date:   2014-09-11T23:43:24Z

    support unpickle array.array for Python 2.6

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55540972
  
    where's the message that was sent to the pyrolite folks?
    
    it looks like SPARK-2378 s targetted for 1.2, so it has a bit of time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55582578
  
    thanks, +1, lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] support unpickle array.array for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55345197
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20181/consoleFull) for   PR 2365 at commit [`60e4e2f`](https://github.com/apache/spark/commit/60e4e2f8e0897555af6e657967bd39c150fc4726).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55551408
  
    I had created https://issues.apache.org/jira/browse/SPARK-3524 to track this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55349723
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20181/consoleFull) for   PR 2365 at commit [`60e4e2f`](https://github.com/apache/spark/commit/60e4e2f8e0897555af6e657967bd39c150fc4726).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55349565
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/74/consoleFull) for   PR 2365 at commit [`60e4e2f`](https://github.com/apache/spark/commit/60e4e2f8e0897555af6e657967bd39c150fc4726).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55480989
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20249/consoleFull) for   PR 2365 at commit [`f44f771`](https://github.com/apache/spark/commit/f44f7715d67b6dd99dbabc2d5b75982fdb21eb97).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor `
      * `        class Dummy(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55479757
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20249/consoleFull) for   PR 2365 at commit [`f44f771`](https://github.com/apache/spark/commit/f44f7715d67b6dd99dbabc2d5b75982fdb21eb97).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55672567
  
    LGTM  Ran relevant python unit tests with no problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55507864
  
    @mattf I just found out that this is blocking #2378, which is blocking other MLlib Python API patches, so I'm going to consider merging this now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55540988
  
    if you do end up merging, what do you think about logging an issue for fixing up the workaround once pyrolite is update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2365#discussion_r17509580
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala ---
    @@ -28,6 +30,56 @@ import org.apache.spark.rdd.RDD
     
     /** Utilities for serialization / deserialization between Python and Java, using Pickle. */
     private[python] object SerDeUtil extends Logging {
    +  // Unpickle array.array generated by Python 2.6
    +  class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor {
    +    //  /* Description of types */
    +    //  static struct arraydescr descriptors[] = {
    +    //    {'c', sizeof(char), c_getitem, c_setitem},
    +    //    {'b', sizeof(char), b_getitem, b_setitem},
    +    //    {'B', sizeof(char), BB_getitem, BB_setitem},
    +    //    #ifdef Py_USING_UNICODE
    +    //      {'u', sizeof(Py_UNICODE), u_getitem, u_setitem},
    +    //    #endif
    +    //    {'h', sizeof(short), h_getitem, h_setitem},
    +    //    {'H', sizeof(short), HH_getitem, HH_setitem},
    +    //    {'i', sizeof(int), i_getitem, i_setitem},
    +    //    {'I', sizeof(int), II_getitem, II_setitem},
    +    //    {'l', sizeof(long), l_getitem, l_setitem},
    +    //    {'L', sizeof(long), LL_getitem, LL_setitem},
    +    //    {'f', sizeof(float), f_getitem, f_setitem},
    +    //    {'d', sizeof(double), d_getitem, d_setitem},
    +    //    {'\0', 0, 0, 0} /* Sentinel */
    +    //  };
    +    // TODO: support Py_UNICODE with 2 bytes
    +    // FIXME: unpickle array of float is wrong in Pyrolite, so we reverse the
    +    // machine code for float/double here to work arround it.
    +    // we should fix this after Pyrolite fix them
    +    val machineCodes: Map[Char, Int] = if (ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN)) {
    +      Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 3, 'h' -> 5, 'I' -> 7, 'i' -> 9,
    +        'L' -> 11, 'l' -> 13, 'f' -> 14, 'd' -> 16, 'u' -> 21
    +      )
    +    } else {
    +      Map('c' -> 1, 'B' -> 0, 'b' -> 1, 'H' -> 2, 'h' -> 4, 'I' -> 6, 'i' -> 8,
    +        'L' -> 10, 'l' -> 12, 'f' -> 15, 'd' -> 17, 'u' -> 20
    +      )
    +    }
    +    override def construct(args: Array[Object]): Object = {
    +      if (args.length == 1) {
    +        construct(args ++ Array(""))
    +      } else if (args.length == 2 && args(1).isInstanceOf[String]) {
    +        val typecode = args(0).asInstanceOf[String].charAt(0)
    +        val data: String = args(1).asInstanceOf[String]
    +        println(typecode, machineCodes(typecode), data.length, data.toList)
    --- End diff --
    
    Can you remove this debugging statement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55345460
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/74/consoleFull) for   PR 2365 at commit [`60e4e2f`](https://github.com/apache/spark/commit/60e4e2f8e0897555af6e657967bd39c150fc4726).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2365


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55470194
  
    Maybe we should wait a couple of days to hear back from the Pyrolite folks and see if they will cut a new release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55685369
  
    I'm going to merge this now.
    
    As a reference / side note, http://bugs.python.org/issue2389 provides some good context for why Python 2.6's array pickling exposes the machine's endianness; Python 2.7 uses a different pickling strategy that doesn't go through this code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2365#issuecomment-55495366
  
    > Maybe we should wait a couple of days to hear back from the Pyrolite folks and see if they will cut a new release.
    
    +1
    
    unless there's agreed upon urgency, it'd be better to fix the dep instead of working around it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org