You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Charles Givre <cg...@gmail.com> on 2020/02/06 13:05:00 UTC

Timestamp Issue

Hi Drill Devs
I'm having a small issue interpreting timestamps from data.  The data in question is in both CSV and parquet format, and has dates encoded as strings in the following format:

1998-07-14T04:00:00

The issue I'm encountering is dealing with the literal T.  The JODA instructions state that you can escape a literal with a single quote.
IE:

yyyy-MM-dd'T'hh:MM:00

However, the issue here is that since Drill does not allow double quotes, all these need to be escaped. 

yyyy-MM-dd\'T\'hh:MM:00

But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP() function.  Any suggestions?
Thanks,
-- C

Re: Timestamp Issue

Posted by "Jaimes, Rafael" <Ra...@ll.mit.edu>.
Hi Charles,

Not sure if you fixed your problem or not but thought I would chime in 
because I was running into similar issues.

I believe it has to do with the fact that Drill uses the Timestamp class 
defined in Java/JDBC. The Timestamp class is formatted without the T or 
Z delimiters,
so technically it is not ISO-8601 compliant. 
https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html

To my surprise, I found that doing a string compare on datetimes stored 
as strings in ISO-8601 format in Parquet were equally as fast as CAST 
datatype to TIMESTAMP and comparing it against the DATETIME object in a 
different column in Parquet.

Just to clarify, my datetimes in the string column have a T in the 
middle and Z on the end, and the datetimes in the actual datetime column 
have a space, with no timezone information.

On 2/7/20 5:07 AM, Igor Guzenko wrote:
> Hello Charles,
>
> It seems to work when I use one additional ' for escaping, for example try
> using 'yyyy-MM-dd''T''HH:mm:ss' for format.
>
> Thanks,
> Igor
>
> On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:
>
>> That is really frustrating because that timestamp is literally in an ISO
>> 8601 format.
>>
>> https://en.wikipedia.org/wiki/ISO_8601
>>
>> It would be nice if these formats just worked by default.
>>
>>
>>
>>
>> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>>
>>> Hi Drill Devs
>>> I'm having a small issue interpreting timestamps from data.  The data in
>>> question is in both CSV and parquet format, and has dates encoded as
>>> strings in the following format:
>>>
>>> 1998-07-14T04:00:00
>>>
>>> The issue I'm encountering is dealing with the literal T.  The JODA
>>> instructions state that you can escape a literal with a single quote.
>>> IE:
>>>
>>> yyyy-MM-dd'T'hh:MM:00
>>>
>>> However, the issue here is that since Drill does not allow double quotes,
>>> all these need to be escaped.
>>>
>>> yyyy-MM-dd\'T\'hh:MM:00
>>>
>>> But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP()
>>> function.  Any suggestions?
>>> Thanks,
>>> -- C


Re: Timestamp Issue

Posted by Igor Guzenko <ih...@gmail.com>.
Hello Charles,

It seems to work when I use one additional ' for escaping, for example try
using 'yyyy-MM-dd''T''HH:mm:ss' for format.

Thanks,
Igor

On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:

> That is really frustrating because that timestamp is literally in an ISO
> 8601 format.
>
> https://en.wikipedia.org/wiki/ISO_8601
>
> It would be nice if these formats just worked by default.
>
>
>
>
> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Drill Devs
> > I'm having a small issue interpreting timestamps from data.  The data in
> > question is in both CSV and parquet format, and has dates encoded as
> > strings in the following format:
> >
> > 1998-07-14T04:00:00
> >
> > The issue I'm encountering is dealing with the literal T.  The JODA
> > instructions state that you can escape a literal with a single quote.
> > IE:
> >
> > yyyy-MM-dd'T'hh:MM:00
> >
> > However, the issue here is that since Drill does not allow double quotes,
> > all these need to be escaped.
> >
> > yyyy-MM-dd\'T\'hh:MM:00
> >
> > But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP()
> > function.  Any suggestions?
> > Thanks,
> > -- C
>

Re: Timestamp Issue

Posted by Igor Guzenko <ih...@gmail.com>.
Hello Charles,

It seems to work when I use one additional ' for escaping, for example try
using 'yyyy-MM-dd''T''HH:mm:ss' for format.

Thanks,
Igor

On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:

> That is really frustrating because that timestamp is literally in an ISO
> 8601 format.
>
> https://en.wikipedia.org/wiki/ISO_8601
>
> It would be nice if these formats just worked by default.
>
>
>
>
> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Drill Devs
> > I'm having a small issue interpreting timestamps from data.  The data in
> > question is in both CSV and parquet format, and has dates encoded as
> > strings in the following format:
> >
> > 1998-07-14T04:00:00
> >
> > The issue I'm encountering is dealing with the literal T.  The JODA
> > instructions state that you can escape a literal with a single quote.
> > IE:
> >
> > yyyy-MM-dd'T'hh:MM:00
> >
> > However, the issue here is that since Drill does not allow double quotes,
> > all these need to be escaped.
> >
> > yyyy-MM-dd\'T\'hh:MM:00
> >
> > But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP()
> > function.  Any suggestions?
> > Thanks,
> > -- C
>

Re: Timestamp Issue

Posted by Ted Dunning <te...@gmail.com>.
That is really frustrating because that timestamp is literally in an ISO
8601 format.

https://en.wikipedia.org/wiki/ISO_8601

It would be nice if these formats just worked by default.




On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:

> Hi Drill Devs
> I'm having a small issue interpreting timestamps from data.  The data in
> question is in both CSV and parquet format, and has dates encoded as
> strings in the following format:
>
> 1998-07-14T04:00:00
>
> The issue I'm encountering is dealing with the literal T.  The JODA
> instructions state that you can escape a literal with a single quote.
> IE:
>
> yyyy-MM-dd'T'hh:MM:00
>
> However, the issue here is that since Drill does not allow double quotes,
> all these need to be escaped.
>
> yyyy-MM-dd\'T\'hh:MM:00
>
> But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP()
> function.  Any suggestions?
> Thanks,
> -- C

Re: Timestamp Issue

Posted by Ted Dunning <te...@gmail.com>.
That is really frustrating because that timestamp is literally in an ISO
8601 format.

https://en.wikipedia.org/wiki/ISO_8601

It would be nice if these formats just worked by default.




On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:

> Hi Drill Devs
> I'm having a small issue interpreting timestamps from data.  The data in
> question is in both CSV and parquet format, and has dates encoded as
> strings in the following format:
>
> 1998-07-14T04:00:00
>
> The issue I'm encountering is dealing with the literal T.  The JODA
> instructions state that you can escape a literal with a single quote.
> IE:
>
> yyyy-MM-dd'T'hh:MM:00
>
> However, the issue here is that since Drill does not allow double quotes,
> all these need to be escaped.
>
> yyyy-MM-dd\'T\'hh:MM:00
>
> But... this just doesn't seem to work.  I'm using the TO_TIMESTAMP()
> function.  Any suggestions?
> Thanks,
> -- C