You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rberenguel <gi...@git.apache.org> on 2017/05/29 15:55:51 UTC

[GitHub] spark pull request #18137: [SPARK-20787][PYTHON] PySpark can't handle dateti...

GitHub user rberenguel opened a pull request:

    https://github.com/apache/spark/pull/18137

    [SPARK-20787][PYTHON] PySpark can't handle datetimes before 1900 

    `time.mktime` can't handle dates from 1899-100, according to the documentation by design. `calendar.timegm` is equivalent in shared cases, but can handle those years.
    
    ## What changes were proposed in this pull request?
    
    Change `time.mktime` for the more able `calendar.timegm` to adress cases like:
    ```python
    import datetime as dt
    sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1899,12,31)]])).count()
    ```
    failing due to internal conversion failure when there is no timezone information in the time object. In the case there is information, `calendar` was used instead.
    
    ## How was this patch tested?
    
    The existing test cases should cover this change, since it should not change any existing functionality.
    
    This PR is original work from me and I license this work to the Spark project

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rberenguel/spark SPARK-20787-invalid-years

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18137
    
----
commit 6c0312f94e3fce2bf4d6a30055bd747be535bb0f
Author: Ruben Berenguel Montoro <ru...@mostlymaths.net>
Date:   2017-05-29T15:46:21Z

    SPARK-20787 time.mktime can’t handle dates from 1899-100, by construction. calendar.timegm is equivalent in shared cases, but can handle those

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18137: [SPARK-20787][PYTHON] PySpark can't handle dateti...

Posted by rberenguel <gi...@git.apache.org>.
Github user rberenguel closed the pull request at:

    https://github.com/apache/spark/pull/18137


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18137: [SPARK-20787][PYTHON] PySpark can't handle datetimes bef...

Posted by rberenguel <gi...@git.apache.org>.
Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/18137
  
    Closing while I fight with an issue seemingly related to DST between gmtime and mktime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18137: [SPARK-20787][PYTHON] PySpark can't handle datetimes bef...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18137
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org