You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rdblue <gi...@git.apache.org> on 2015/04/10 18:41:36 UTC

[GitHub] spark pull request: [SPARK-4985][SQL Parquet] Parquet date support

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3855#discussion_r28160082
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala ---
    @@ -207,6 +207,7 @@ private[parquet] class RowWriteSupport extends WriteSupport[Row] with Logging {
             case DoubleType => writer.addDouble(value.asInstanceOf[Double])
             case FloatType => writer.addFloat(value.asInstanceOf[Float])
             case BooleanType => writer.addBoolean(value.asInstanceOf[Boolean])
    +        case DateType => writer.addInteger(value.asInstanceOf[java.sql.Date].getTime.toInt)
    --- End diff --
    
    This doesn't conform to the [Parquet specification for date](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date) and produces invalid data.
    
    When using the `DATE` annotation, the value must be the number of days from the Unix epoch, 1 January 1970. `java.sql.Date` and `java.util.Date` are backed by a long timestamp, the number of milliseconds from the Unix epoch (which is a Parquet `TIMESTAMP_MILLIS`) and casting that value to an integer makes it impossible to recover the real date.
    
    I recommend using `TIMESTAMP_MILLIS` instead of date here (you won't need the `toInt` part). That seems to be what you want, if you're interested in using `java.sql.Date`. The reason why there is a name mismatch is that the Parquet types mirror SQL types more closely than Java objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org