You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:25:14 UTC
[jira] [Updated] (SPARK-14428) [SQL] Allow more flexibility when
parsing dates and timestamps in json datasources
[ https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-14428:
---------------------------------
Labels: bulk-closed date features json timestamp (was: date features json timestamp)
> [SQL] Allow more flexibility when parsing dates and timestamps in json datasources
> ----------------------------------------------------------------------------------
>
> Key: SPARK-14428
> URL: https://issues.apache.org/jira/browse/SPARK-14428
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Michel Lemay
> Priority: Minor
> Labels: bulk-closed, date, features, json, timestamp
>
> Reading a json with dates and timestamps is limited to predetermined string formats or long values.
> 1) Should be able to set an option on json datasource to parse dates and timestamps using custom string format.
> 2) Should be able to change the interpretation of long values since epoch. It could support different precisions like days, seconds, milliseconds, microseconds and nanoseconds.
> Something in the lines of :
> {code}
> object Precision extends Enumeration {
> val days, seconds, milliseconds, microseconds, nanoseconds = Value
> }
> def convertWithPrecision(time: Long, from: Precision.Value, to: Precision.Value): Long = ...
> ...
> val dateFormat = parameters.getOrElse("dateFormat", "").trim
> val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
> val longDatePrecision = getOrElse("longDatePrecision", "days")
> val longTimestampPrecision = getOrElse("longTimestampPrecision", "milliseconds")
> {code}
> and
> {code}
> case (VALUE_STRING, DateType) =>
> val stringValue = parser.getText
> val days = if (configOptions.dateFormat.nonEmpty) {
> // User defined format, make sure it complies to the SQL DATE format (number of days)
> val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not thread safe.
> DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, Precision.milliseconds, Precision.days)
> } else if (stringValue.forall(_.isDigit)) {
> DateTimeUtils.convertWithPrecision(stringValue.toLong, configOptions.longDatePrecision, Precision.days)
> } else {
> // The format of this string will probably be "yyyy-mm-dd".
> DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, Precision.milliseconds, Precision.days)
> }
> days.toInt
> case (VALUE_NUMBER_INT, DateType) =>
> DateTimeUtils.convertWithPrecision((parser.getLongValue, configOptions.longDatePrecision, Precision.days).toInt
> {code}
> With similar handling for Timestamps.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org