You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/04/18 09:27:37 UTC

[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/5570

    [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression

    This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime.
    
    There are two bugs related to auto_convert, see [1] and [2], we workaround it in this PR.
    
    [1]  https://github.com/bartdag/py4j/issues/160
    [2] https://github.com/bartdag/py4j/issues/161
    
    cc @rxin @JoshRosen 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark py4j_date

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5570
    
----
commit cb094ff5eed6938c89cd96c70cd42eaf9ed77894
Author: Davies Liu <da...@databricks.com>
Date:   2015-04-18T05:32:04Z

    enable auto convert

commit 3c373f3bc25b90d0e77708983c0f1d8cbb8cc5a6
Author: Davies Liu <da...@databricks.com>
Date:   2015-04-18T06:48:09Z

    support date and datetime by auto_convert

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94222407
  
      [Test build #30541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30541/consoleFull) for   PR 5570 at commit [`d17d634`](https://github.com/apache/spark/commit/d17d6343f0a6f834a8f910ee2bb4fabd25927d14).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DateConverter(object):`
      * `class DatetimeConverter(object):`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94222417
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30541/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94139809
  
      [Test build #30514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30514/consoleFull) for   PR 5570 at commit [`3c373f3`](https://github.com/apache/spark/commit/3c373f3bc25b90d0e77708983c0f1d8cbb8cc5a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94215816
  
      [Test build #30541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30541/consoleFull) for   PR 5570 at commit [`d17d634`](https://github.com/apache/spark/commit/d17d6343f0a6f834a8f910ee2bb4fabd25927d14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94244420
  
      [Test build #30545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30545/consoleFull) for   PR 5570 at commit [`eb4fa53`](https://github.com/apache/spark/commit/eb4fa533284f6f072cbafc6ded5e622374f4b7bf).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DateConverter(object):`
      * `class DatetimeConverter(object):`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5570#discussion_r28642185
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2267,6 +2267,8 @@ def _prepare_for_python_RDD(sc, command, obj=None):
             # The broadcast will have same life cycle as created PythonRDD
             broadcast = sc.broadcast(pickled_command)
             pickled_command = ser.dumps(broadcast)
    +    # There is a bug in py4j.java_gateway.JavaClass with auto_convert
    --- End diff --
    
    Added a link here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94190966
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30525/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94215598
  
    Looks like the change broke something in MLlib.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94661265
  
    Thanks. I'm going to merge this in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94244424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30545/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94186349
  
      [Test build #30525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30525/consoleFull) for   PR 5570 at commit [`2e7566d`](https://github.com/apache/spark/commit/2e7566dcc4136e97f34b5558209199445398e605).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94144975
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30514/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94235840
  
      [Test build #30545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30545/consoleFull) for   PR 5570 at commit [`eb4fa53`](https://github.com/apache/spark/commit/eb4fa533284f6f072cbafc6ded5e622374f4b7bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94145069
  
      [Test build #30515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30515/consoleFull) for   PR 5570 at commit [`ceb3779`](https://github.com/apache/spark/commit/ceb3779bbe150793494820ec42d44b5f771778ee).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DateConverter(object):`
      * `class DatetimeConverter(object):`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94190959
  
      [Test build #30525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30525/consoleFull) for   PR 5570 at commit [`2e7566d`](https://github.com/apache/spark/commit/2e7566dcc4136e97f34b5558209199445398e605).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DateConverter(object):`
      * `class DatetimeConverter(object):`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94144970
  
      [Test build #30514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30514/consoleFull) for   PR 5570 at commit [`3c373f3`](https://github.com/apache/spark/commit/3c373f3bc25b90d0e77708983c0f1d8cbb8cc5a6).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DateConverter(object):`
      * `class DatetimeConverter(object):`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94140489
  
      [Test build #30515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30515/consoleFull) for   PR 5570 at commit [`ceb3779`](https://github.com/apache/spark/commit/ceb3779bbe150793494820ec42d44b5f771778ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94145073
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30515/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5570#issuecomment-94139985
  
    @JoshRosen can you take a look at this? I don't really know the py4j stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6949] [SQL] [PySpark] Support Date/Time...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5570#discussion_r28642177
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2267,6 +2267,8 @@ def _prepare_for_python_RDD(sc, command, obj=None):
             # The broadcast will have same life cycle as created PythonRDD
             broadcast = sc.broadcast(pickled_command)
             pickled_command = ser.dumps(broadcast)
    +    # There is a bug in py4j.java_gateway.JavaClass with auto_convert
    --- End diff --
    
    can you document what bug it is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org