You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2017/03/09 18:34:57 UTC
spark git commit: [SPARK-19561][SQL] add int case handling for
TimestampType
Repository: spark
Updated Branches:
refs/heads/master 274973d2a -> 206030bd1
[SPARK-19561][SQL] add int case handling for TimestampType
## What changes were proposed in this pull request?
Add handling of input of type `Int` for dataType `TimestampType` to `EvaluatePython.scala`. Py4J serializes ints smaller than MIN_INT or larger than MAX_INT to Long, which are handled correctly already, but values between MIN_INT and MAX_INT are serialized to Int.
These range limits correspond to roughly half an hour on either side of the epoch. As a result, PySpark doesn't allow TimestampType values to be created in this range.
Alternatives attempted: patching the `TimestampType.toInternal` function to cast return values to `long`, so Py4J would always serialize them to Scala Long. Python3 does not have a `long` type, so this approach failed on Python3.
## How was this patch tested?
Added a new PySpark-side test that fails without the change.
The contribution is my original work and I license the work to the project under the project\u2019s open source license.
Resubmission of https://github.com/apache/spark/pull/16896. The original PR didn't go through Jenkins and broke the build. davies dongjoon-hyun
cloud-fan Could you kick off a Jenkins run for me? It passed everything for me locally, but it's possible something has changed in the last few weeks.
Author: Jason White <ja...@shopify.com>
Closes #17200 from JasonMWhite/SPARK-19561.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/206030bd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/206030bd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/206030bd
Branch: refs/heads/master
Commit: 206030bd12405623c00c1ff334663984b9250adb
Parents: 274973d
Author: Jason White <ja...@shopify.com>
Authored: Thu Mar 9 10:34:54 2017 -0800
Committer: Wenchen Fan <we...@databricks.com>
Committed: Thu Mar 9 10:34:54 2017 -0800
----------------------------------------------------------------------
python/pyspark/sql/tests.py | 8 ++++++++
.../apache/spark/sql/execution/python/EvaluatePython.scala | 2 ++
2 files changed, 10 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/206030bd/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 81f3d1d..1b873e9 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -1555,6 +1555,14 @@ class SQLTests(ReusedPySparkTestCase):
self.assertEqual(now, now1)
self.assertEqual(now, utcnow1)
+ # regression test for SPARK-19561
+ def test_datetime_at_epoch(self):
+ epoch = datetime.datetime.fromtimestamp(0)
+ df = self.spark.createDataFrame([Row(date=epoch)])
+ first = df.select('date', lit(epoch).alias('lit_date')).first()
+ self.assertEqual(first['date'], epoch)
+ self.assertEqual(first['lit_date'], epoch)
+
def test_decimal(self):
from decimal import Decimal
schema = StructType([StructField("decimal", DecimalType(10, 5))])
http://git-wip-us.apache.org/repos/asf/spark/blob/206030bd/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
index 46fd54e..fcd8470 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
@@ -112,6 +112,8 @@ object EvaluatePython {
case (c: Int, DateType) => c
case (c: Long, TimestampType) => c
+ // Py4J serializes values between MIN_INT and MAX_INT as Ints, not Longs
+ case (c: Int, TimestampType) => c.toLong
case (c, StringType) => UTF8String.fromString(c.toString)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org