You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:25:14 UTC

[jira] [Updated] (SPARK-14428) [SQL] Allow more flexibility when parsing dates and timestamps in json datasources

     [ https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-14428:
---------------------------------
    Labels: bulk-closed date features json timestamp  (was: date features json timestamp)

> [SQL] Allow more flexibility when parsing dates and timestamps in json datasources
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-14428
>                 URL: https://issues.apache.org/jira/browse/SPARK-14428
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Michel Lemay
>            Priority: Minor
>              Labels: bulk-closed, date, features, json, timestamp
>
> Reading a json with dates and timestamps is limited to predetermined string formats or long values.
> 1) Should be able to set an option on json datasource to parse dates and timestamps using custom string format.
> 2) Should be able to change the interpretation of long values since epoch.  It could support different precisions like days, seconds, milliseconds, microseconds and nanoseconds.  
> Something in the lines of :
> {code}
> object Precision extends Enumeration {
>     val days, seconds, milliseconds, microseconds, nanoseconds = Value
>   }
> def convertWithPrecision(time: Long, from: Precision.Value, to: Precision.Value): Long = ...
> ...
>   val dateFormat = parameters.getOrElse("dateFormat", "").trim
>   val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
>   val longDatePrecision = getOrElse("longDatePrecision", "days")
>   val longTimestampPrecision = getOrElse("longTimestampPrecision", "milliseconds")
> {code}
> and 
> {code}
>       case (VALUE_STRING, DateType) =>
>         val stringValue = parser.getText
>         val days = if (configOptions.dateFormat.nonEmpty) {
>           // User defined format, make sure it complies to the SQL DATE format (number of days)
>           val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not thread safe.
>           DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, Precision.milliseconds, Precision.days)
>         } else if (stringValue.forall(_.isDigit)) {
>           DateTimeUtils.convertWithPrecision(stringValue.toLong, configOptions.longDatePrecision, Precision.days)
>         } else {
>           // The format of this string will probably be "yyyy-mm-dd".
>           DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, Precision.milliseconds, Precision.days)
>         }
>         days.toInt
>       case (VALUE_NUMBER_INT, DateType) =>
>           DateTimeUtils.convertWithPrecision((parser.getLongValue, configOptions.longDatePrecision, Precision.days).toInt
> {code}
> With similar handling for Timestamps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org