You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/26 17:59:36 UTC

[GitHub] [spark] marmbrus commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser

marmbrus commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser
URL: https://github.com/apache/spark/pull/27710#discussion_r384665169
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##########
 @@ -175,10 +175,30 @@ class UnivocityParser(
       }
 
     case _: TimestampType => (d: String) =>
-      nullSafeDatum(d, name, nullable, options)(timestampFormatter.parse)
+      nullSafeDatum(d, name, nullable, options) { datum =>
+        try {
+          timestampFormatter.parse(datum)
+        } catch {
+          case NonFatal(e) =>
+            // If fails to parse, then tries the way used in 2.0 and 1.x for backwards
+            // compatibility.
+            val str = UTF8String.fromString(datum)
+            DateTimeUtils.stringToTimestamp(str, options.zoneId).getOrElse(throw e)
 
 Review comment:
    - What cases used to be supported that are no longer supported?
    - What remains in the 1%?
    - Why can we not get 100%?
   
   To me, Spark suddenly failing to parse timestamps that it used to parse fall into the "extremely annoying" category of regressions. Working with timestamps is already hard and confusing. If my job just suddenly starts returning `null` when it used to work as expected its:
    - Probably going to take a while for people to figure it out (while the `null` partition of their table mysteriously fills up).
    - Probably non-trival for them to figure out how to fix it. (I don't think constructing / testing format strings is a trivial activity).
   
   I really question why we changed these API's rather than creating new better ones. These changes broke my pipeline and it took me hours + access to a team of spark committers to figure out why!!!!!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org