You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2017/10/18 14:49:01 UTC

Date Conversion Question

Hello Drillers, 
I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)

SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
EXTRACT( 
hour FROM
TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
 ) AS dt_hour

I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.  

So, questions:
1.  Is there any way to include a literal character in a joda date format?
2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
3.  Are there any ways to do this that I’m missing?

Thanks!
—C 


Re: Date Conversion Question

Posted by Charles Givre <cg...@gmail.com>.
Hi Julian, 
Alas, this doesn’t work in Drill since Drill uses Joda time formats.  However, you got me thinking about this and I actually got it to work w/o using the substring or other weird string manipulation functions.

SELECT to_timestamp ('2017-08-10T09:12:26.000Z', 'yyyy-MM-dd''T''hh:mm:ss.SSS''Z''') FROM (VALUES(1))

Apparently, the double single quotes act as an escape character for plain text in the format string.  We really should make either the CAST() or the TO_TIMESTAMP a little easier to use as this is really counter-intuitive.
— C



> On Oct 18, 2017, at 12:47, Julian Hyde <jh...@apache.org> wrote:
> 
> A question on StackOverflow asks how to do this using Oracle’s TO_TIMESTAMP function, and there is a solution[1]. So, I tried
> 
>  SELECT to_timestamp ('2017-08-10T09:12:26.000Z', 'YYYY-MM-DD"T"HH24:MI:SS.FF3"Z"')
>  FROM DUAL
> 
> on http://rextester.com/l/oracle_online_compiler <http://rextester.com/l/oracle_online_compiler> and it worked.
> 
> I presume Drill’s TO_TIMESTAMP is based is based on Oracle’s. In which case let’s fix TO_TIMESTAMP.
> 
> Julian
> 
> [1] https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype <https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype> 
> 
>> On Oct 18, 2017, at 7:57 AM, Bob Rudis <bo...@rud.is> wrote:
>> 
>> FWIW I was doing very similar substring (etc) machinations until we
>> started converting output from back-end data-generation tools directly
>> into parquet (using other tools). IMO it's a common enough format (at
>> least in the types of data you and I likely have to work with :-) that
>> it'd be great if there was direct support for it. If there is, I also
>> missed it and would also be most appreciative of which incantations to
>> use to take advantage of it.
>> 
>> On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cg...@gmail.com> wrote:
>>> Hello Drillers,
>>> I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)
>>> 
>>> SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
>>> EXTRACT(
>>> hour FROM
>>> TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
>>> ) AS dt_hour
>>> 
>>> I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.
>>> 
>>> So, questions:
>>> 1.  Is there any way to include a literal character in a joda date format?
>>> 2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
>>> 3.  Are there any ways to do this that I’m missing?
>>> 
>>> Thanks!
>>> —C
>>> 
> 


Re: Date Conversion Question

Posted by Charles Givre <cg...@gmail.com>.
Hi Julian, 
Alas, this doesn’t work in Drill since Drill uses Joda time formats.  However, you got me thinking about this and I actually got it to work w/o using the substring or other weird string manipulation functions.

SELECT to_timestamp ('2017-08-10T09:12:26.000Z', 'yyyy-MM-dd''T''hh:mm:ss.SSS''Z''') FROM (VALUES(1))

Apparently, the double single quotes act as an escape character for plain text in the format string.  We really should make either the CAST() or the TO_TIMESTAMP a little easier to use as this is really counter-intuitive.
— C



> On Oct 18, 2017, at 12:47, Julian Hyde <jh...@apache.org> wrote:
> 
> A question on StackOverflow asks how to do this using Oracle’s TO_TIMESTAMP function, and there is a solution[1]. So, I tried
> 
>  SELECT to_timestamp ('2017-08-10T09:12:26.000Z', 'YYYY-MM-DD"T"HH24:MI:SS.FF3"Z"')
>  FROM DUAL
> 
> on http://rextester.com/l/oracle_online_compiler <http://rextester.com/l/oracle_online_compiler> and it worked.
> 
> I presume Drill’s TO_TIMESTAMP is based is based on Oracle’s. In which case let’s fix TO_TIMESTAMP.
> 
> Julian
> 
> [1] https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype <https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype> 
> 
>> On Oct 18, 2017, at 7:57 AM, Bob Rudis <bo...@rud.is> wrote:
>> 
>> FWIW I was doing very similar substring (etc) machinations until we
>> started converting output from back-end data-generation tools directly
>> into parquet (using other tools). IMO it's a common enough format (at
>> least in the types of data you and I likely have to work with :-) that
>> it'd be great if there was direct support for it. If there is, I also
>> missed it and would also be most appreciative of which incantations to
>> use to take advantage of it.
>> 
>> On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cg...@gmail.com> wrote:
>>> Hello Drillers,
>>> I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)
>>> 
>>> SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
>>> EXTRACT(
>>> hour FROM
>>> TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
>>> ) AS dt_hour
>>> 
>>> I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.
>>> 
>>> So, questions:
>>> 1.  Is there any way to include a literal character in a joda date format?
>>> 2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
>>> 3.  Are there any ways to do this that I’m missing?
>>> 
>>> Thanks!
>>> —C
>>> 
> 


Re: Date Conversion Question

Posted by Julian Hyde <jh...@apache.org>.
A question on StackOverflow asks how to do this using Oracle’s TO_TIMESTAMP function, and there is a solution[1]. So, I tried

  SELECT to_timestamp ('2017-08-10T09:12:26.000Z', 'YYYY-MM-DD"T"HH24:MI:SS.FF3"Z"')
  FROM DUAL

on http://rextester.com/l/oracle_online_compiler <http://rextester.com/l/oracle_online_compiler> and it worked.

I presume Drill’s TO_TIMESTAMP is based is based on Oracle’s. In which case let’s fix TO_TIMESTAMP.

Julian
 
[1] https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype <https://stackoverflow.com/questions/26671557/convert-string-iso-8601-date-to-oracles-timestamp-datatype> 

> On Oct 18, 2017, at 7:57 AM, Bob Rudis <bo...@rud.is> wrote:
> 
> FWIW I was doing very similar substring (etc) machinations until we
> started converting output from back-end data-generation tools directly
> into parquet (using other tools). IMO it's a common enough format (at
> least in the types of data you and I likely have to work with :-) that
> it'd be great if there was direct support for it. If there is, I also
> missed it and would also be most appreciative of which incantations to
> use to take advantage of it.
> 
> On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cg...@gmail.com> wrote:
>> Hello Drillers,
>> I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)
>> 
>> SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
>> EXTRACT(
>> hour FROM
>> TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
>> ) AS dt_hour
>> 
>> I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.
>> 
>> So, questions:
>> 1.  Is there any way to include a literal character in a joda date format?
>> 2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
>> 3.  Are there any ways to do this that I’m missing?
>> 
>> Thanks!
>> —C
>> 


Re: Date Conversion Question

Posted by Bob Rudis <bo...@rud.is>.
FWIW I was doing very similar substring (etc) machinations until we
started converting output from back-end data-generation tools directly
into parquet (using other tools). IMO it's a common enough format (at
least in the types of data you and I likely have to work with :-) that
it'd be great if there was direct support for it. If there is, I also
missed it and would also be most appreciative of which incantations to
use to take advantage of it.

On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cg...@gmail.com> wrote:
> Hello Drillers,
> I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)
>
> SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
> EXTRACT(
> hour FROM
> TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
>  ) AS dt_hour
>
> I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.
>
> So, questions:
> 1.  Is there any way to include a literal character in a joda date format?
> 2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
> 3.  Are there any ways to do this that I’m missing?
>
> Thanks!
> —C
>

Re: Date Conversion Question

Posted by Bob Rudis <bo...@rud.is>.
FWIW I was doing very similar substring (etc) machinations until we
started converting output from back-end data-generation tools directly
into parquet (using other tools). IMO it's a common enough format (at
least in the types of data you and I likely have to work with :-) that
it'd be great if there was direct support for it. If there is, I also
missed it and would also be most appreciative of which incantations to
use to take advantage of it.

On Wed, Oct 18, 2017 at 10:49 AM, Charles Givre <cg...@gmail.com> wrote:
> Hello Drillers,
> I have a silly question which I’m a little stuck with.  I have some data in CSV format with dates in the following format:  2017-08-10T09:12:26.000Z.  I’m trying to convert this into a date time data field so that I have both the date and the hours, however I keep running into road blocks.   I’ve tried the CAST( field AS DATE ) but in doing so I lose the time component.  I’ve tried the TO_TIMESTAMP function, however the only success I’ve had is using the substring function to remove the timezone at the end, then use regex_replace to get rid of the literal ’T’ in the middle of the string, then TO_NUMBER.  (See query below)
>
> SELECT TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  ) AS dt,
> EXTRACT(
> hour FROM
> TO_TIMESTAMP( REGEXP_REPLACE( SUBSTR( <field>, 1,19), 'T', ' '), 'yyyy-MM-dd HH:mm:ss'  )
>  ) AS dt_hour
>
> I’d think you could do this directly with the TO_TIMESTAMP function however, I can’t figure out how include the literal ’T’ in the formatting string.  The escape character seems to be the single quote which also is the only character allowed to denote the formatting string.
>
> So, questions:
> 1.  Is there any way to include a literal character in a joda date format?
> 2.  Is it possible to use any character besides a single quote to mark the beginning/end of a format string?
> 3.  Are there any ways to do this that I’m missing?
>
> Thanks!
> —C
>