You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Charles Givre <cg...@gmail.com> on 2020/02/06 13:05:00 UTC
Timestamp Issue
Hi Drill Devs
I'm having a small issue interpreting timestamps from data. The data in question is in both CSV and parquet format, and has dates encoded as strings in the following format:
1998-07-14T04:00:00
The issue I'm encountering is dealing with the literal T. The JODA instructions state that you can escape a literal with a single quote.
IE:
yyyy-MM-dd'T'hh:MM:00
However, the issue here is that since Drill does not allow double quotes, all these need to be escaped.
yyyy-MM-dd\'T\'hh:MM:00
But... this just doesn't seem to work. I'm using the TO_TIMESTAMP() function. Any suggestions?
Thanks,
-- C
Re: Timestamp Issue
Posted by "Jaimes, Rafael" <Ra...@ll.mit.edu>.
Hi Charles,
Not sure if you fixed your problem or not but thought I would chime in
because I was running into similar issues.
I believe it has to do with the fact that Drill uses the Timestamp class
defined in Java/JDBC. The Timestamp class is formatted without the T or
Z delimiters,
so technically it is not ISO-8601 compliant.
https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html
To my surprise, I found that doing a string compare on datetimes stored
as strings in ISO-8601 format in Parquet were equally as fast as CAST
datatype to TIMESTAMP and comparing it against the DATETIME object in a
different column in Parquet.
Just to clarify, my datetimes in the string column have a T in the
middle and Z on the end, and the datetimes in the actual datetime column
have a space, with no timezone information.
On 2/7/20 5:07 AM, Igor Guzenko wrote:
> Hello Charles,
>
> It seems to work when I use one additional ' for escaping, for example try
> using 'yyyy-MM-dd''T''HH:mm:ss' for format.
>
> Thanks,
> Igor
>
> On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:
>
>> That is really frustrating because that timestamp is literally in an ISO
>> 8601 format.
>>
>> https://en.wikipedia.org/wiki/ISO_8601
>>
>> It would be nice if these formats just worked by default.
>>
>>
>>
>>
>> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>>
>>> Hi Drill Devs
>>> I'm having a small issue interpreting timestamps from data. The data in
>>> question is in both CSV and parquet format, and has dates encoded as
>>> strings in the following format:
>>>
>>> 1998-07-14T04:00:00
>>>
>>> The issue I'm encountering is dealing with the literal T. The JODA
>>> instructions state that you can escape a literal with a single quote.
>>> IE:
>>>
>>> yyyy-MM-dd'T'hh:MM:00
>>>
>>> However, the issue here is that since Drill does not allow double quotes,
>>> all these need to be escaped.
>>>
>>> yyyy-MM-dd\'T\'hh:MM:00
>>>
>>> But... this just doesn't seem to work. I'm using the TO_TIMESTAMP()
>>> function. Any suggestions?
>>> Thanks,
>>> -- C
Re: Timestamp Issue
Posted by Igor Guzenko <ih...@gmail.com>.
Hello Charles,
It seems to work when I use one additional ' for escaping, for example try
using 'yyyy-MM-dd''T''HH:mm:ss' for format.
Thanks,
Igor
On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:
> That is really frustrating because that timestamp is literally in an ISO
> 8601 format.
>
> https://en.wikipedia.org/wiki/ISO_8601
>
> It would be nice if these formats just worked by default.
>
>
>
>
> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Drill Devs
> > I'm having a small issue interpreting timestamps from data. The data in
> > question is in both CSV and parquet format, and has dates encoded as
> > strings in the following format:
> >
> > 1998-07-14T04:00:00
> >
> > The issue I'm encountering is dealing with the literal T. The JODA
> > instructions state that you can escape a literal with a single quote.
> > IE:
> >
> > yyyy-MM-dd'T'hh:MM:00
> >
> > However, the issue here is that since Drill does not allow double quotes,
> > all these need to be escaped.
> >
> > yyyy-MM-dd\'T\'hh:MM:00
> >
> > But... this just doesn't seem to work. I'm using the TO_TIMESTAMP()
> > function. Any suggestions?
> > Thanks,
> > -- C
>
Re: Timestamp Issue
Posted by Igor Guzenko <ih...@gmail.com>.
Hello Charles,
It seems to work when I use one additional ' for escaping, for example try
using 'yyyy-MM-dd''T''HH:mm:ss' for format.
Thanks,
Igor
On Thu, Feb 6, 2020 at 11:43 PM Ted Dunning <te...@gmail.com> wrote:
> That is really frustrating because that timestamp is literally in an ISO
> 8601 format.
>
> https://en.wikipedia.org/wiki/ISO_8601
>
> It would be nice if these formats just worked by default.
>
>
>
>
> On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Drill Devs
> > I'm having a small issue interpreting timestamps from data. The data in
> > question is in both CSV and parquet format, and has dates encoded as
> > strings in the following format:
> >
> > 1998-07-14T04:00:00
> >
> > The issue I'm encountering is dealing with the literal T. The JODA
> > instructions state that you can escape a literal with a single quote.
> > IE:
> >
> > yyyy-MM-dd'T'hh:MM:00
> >
> > However, the issue here is that since Drill does not allow double quotes,
> > all these need to be escaped.
> >
> > yyyy-MM-dd\'T\'hh:MM:00
> >
> > But... this just doesn't seem to work. I'm using the TO_TIMESTAMP()
> > function. Any suggestions?
> > Thanks,
> > -- C
>
Re: Timestamp Issue
Posted by Ted Dunning <te...@gmail.com>.
That is really frustrating because that timestamp is literally in an ISO
8601 format.
https://en.wikipedia.org/wiki/ISO_8601
It would be nice if these formats just worked by default.
On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
> Hi Drill Devs
> I'm having a small issue interpreting timestamps from data. The data in
> question is in both CSV and parquet format, and has dates encoded as
> strings in the following format:
>
> 1998-07-14T04:00:00
>
> The issue I'm encountering is dealing with the literal T. The JODA
> instructions state that you can escape a literal with a single quote.
> IE:
>
> yyyy-MM-dd'T'hh:MM:00
>
> However, the issue here is that since Drill does not allow double quotes,
> all these need to be escaped.
>
> yyyy-MM-dd\'T\'hh:MM:00
>
> But... this just doesn't seem to work. I'm using the TO_TIMESTAMP()
> function. Any suggestions?
> Thanks,
> -- C
Re: Timestamp Issue
Posted by Ted Dunning <te...@gmail.com>.
That is really frustrating because that timestamp is literally in an ISO
8601 format.
https://en.wikipedia.org/wiki/ISO_8601
It would be nice if these formats just worked by default.
On Thu, Feb 6, 2020 at 5:05 AM Charles Givre <cg...@gmail.com> wrote:
> Hi Drill Devs
> I'm having a small issue interpreting timestamps from data. The data in
> question is in both CSV and parquet format, and has dates encoded as
> strings in the following format:
>
> 1998-07-14T04:00:00
>
> The issue I'm encountering is dealing with the literal T. The JODA
> instructions state that you can escape a literal with a single quote.
> IE:
>
> yyyy-MM-dd'T'hh:MM:00
>
> However, the issue here is that since Drill does not allow double quotes,
> all these need to be escaped.
>
> yyyy-MM-dd\'T\'hh:MM:00
>
> But... this just doesn't seem to work. I'm using the TO_TIMESTAMP()
> function. Any suggestions?
> Thanks,
> -- C