You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Vince Gonzalez <vi...@gmail.com> on 2015/04/01 21:54:36 UTC

more on parsing timestamps

Is this a bug?

Created a parquet table (using CTAS) with one column containing text
timestamps.

0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
+------------+
|     t      |
+------------+
| 2015-01-27T13:43:53.000Z |
+------------+
1 row selected (0.119 seconds)

The below queries, identical apart from the limit clause, behave
differently. The one with the limit clause works, the one without doesn't.
The limit is larger than the total number of rows, so in both cases we
should be processing all rows.

No limit clause. It fails:

```
0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as t;
Query failed: RemoteRpcException: Failure while trying to start remote
fragment, Expression has syntax error! line 1:30:mismatched input 'T'
expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
192.168.99.1:31010 ]
```

Limit clause in the subselect (larger than the number of rows in the table)
succeeds.

```
0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
100000000) as t;
...
| 2015-02-17 07:18:00.0 |
+------------+
13,015,350 rows selected (105.257 seconds)
```

Data can be downloaded here:

https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz

Re: more on parsing timestamps

Posted by Sudhakar Thota <st...@maprtech.com>.

Steven,

Thanks. Andries already pointed that to me and I opened public jira.

https://issues.apache.org/jira/browse/DRILL-2669

Thanks
Sudhakar Thota


On Apr 2, 2015, at 4:09 PM, Steven Phillips <sp...@maprtech.com> wrote:

> Could you please file a public jira. That link is to an internal issue.
> 
> On Thu, Apr 2, 2015 at 1:52 PM, Sudhakar Thota <st...@maprtech.com> wrote:
> 
>> Here it is:
>> 
>> https://maprdrill.atlassian.net/browse/MD-204?filter=-2
>> 
>> Thanks
>> Sudhakar Thota
>> 
>> 
>> On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <ae...@maprtech.com>
>> wrote:
>> 
>>> Cool, thx for testing.
>>> 
>>> Best to file a JIRA.
>>> 
>>> —Andries
>>> 
>>> On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <st...@maprtech.com> wrote:
>>> 
>>>> Vince/Andries,
>>>> 
>>>> Perhaps this could be a bug. I get the same results.
>>>> 
>>>> But the plan is very different, the UnionExchange is set up immediately
>> after the scan operation in successful case( Case -1 ), where as
>> UnionExchange is happening after scan->project (Case -2).
>>>> 
>>>> Case -1.Successful case.
>>>> 
>>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
>> dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
>>>> +------------+------------+
>>>> |    text    |    json    |
>>>> +------------+------------+
>>>> | 00-00    Screen
>>>> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>>>> 00-02        SelectionVectorRemover
>>>> 00-03          Limit(fetch=[13015351])
>>>> 00-04            UnionExchange
>>>> 01-01              Scan(groupscan=[ParquetGroupScan
>> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
>> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
>> numFiles=3, columns=[`*`]]])
>>>> | {
>>>> "head" : {
>>>>  "version" : 1,
>>>>  "generator" : {
>>>>    "type" : "ExplainHandler",
>>>>    "info" : ""
>>>>  },
>>>>  "type" : "APACHE_DRILL_PHYSICAL",
>>>>  "options" : [ ],
>>>>  "queue" : 0,
>>>>  "resultMode" : "EXEC"
>>>> },
>>>> 
>>>> Case -2. Unsuccessful case:
>>>> 
>>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
>> dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
>>>> +------------+------------+
>>>> |    text    |    json    |
>>>> +------------+------------+
>>>> | 00-00    Screen
>>>> 00-01      UnionExchange
>>>> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>>>> 01-02          Scan(groupscan=[ParquetGroupScan
>> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
>> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
>> numFiles=3, columns=[`*`]]])
>>>> | {
>>>> "head" : {
>>>>  "version" : 1,
>>>>  "generator" : {
>>>>    "type" : "ExplainHandler",
>>>>    "info" : ""
>>>>  },
>>>>  "type" : "APACHE_DRILL_PHYSICAL",
>>>>  "options" : [ ],
>>>>  "queue" : 0,
>>>>  "resultMode" : "EXEC"
>>>> },
>>>> 
>>>> Thanks
>>>> Sudhakar Thota
>>>> 
>>>> 
>>>> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vi...@gmail.com>
>> wrote:
>>>> 
>>>>> Ok, will do. Thanks.
>>>>> 
>>>>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>>>>> aengelbrecht@maprtech.com> wrote:
>>>>> 
>>>>>> Compare the query plans and you probably want to look at the log file
>> to
>>>>>> see what fails and post here.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> —Andries
>>>>>> 
>>>>>> 
>>>>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vince.gonzalez@gmail.com
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Is this a bug?
>>>>>>> 
>>>>>>> Created a parquet table (using CTAS) with one column containing text
>>>>>>> timestamps.
>>>>>>> 
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>>>>> +------------+
>>>>>>> |     t      |
>>>>>>> +------------+
>>>>>>> | 2015-01-27T13:43:53.000Z |
>>>>>>> +------------+
>>>>>>> 1 row selected (0.119 seconds)
>>>>>>> 
>>>>>>> The below queries, identical apart from the limit clause, behave
>>>>>>> differently. The one with the limit clause works, the one without
>>>>>> doesn't.
>>>>>>> The limit is larger than the total number of rows, so in both cases
>> we
>>>>>>> should be processing all rows.
>>>>>>> 
>>>>>>> No limit clause. It fails:
>>>>>>> 
>>>>>>> ```
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test)
>> as
>>>>>> t;
>>>>>>> Query failed: RemoteRpcException: Failure while trying to start
>> remote
>>>>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>>>>> 192.168.99.1:31010 ]
>>>>>>> ```
>>>>>>> 
>>>>>>> Limit clause in the subselect (larger than the number of rows in the
>>>>>> table)
>>>>>>> succeeds.
>>>>>>> 
>>>>>>> ```
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test
>> limit
>>>>>>> 100000000) as t;
>>>>>>> ...
>>>>>>> | 2015-02-17 07:18:00.0 |
>>>>>>> +------------+
>>>>>>> 13,015,350 rows selected (105.257 seconds)
>>>>>>> ```
>>>>>>> 
>>>>>>> Data can be downloaded here:
>>>>>>> 
>>>>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Steven Phillips
> Software Engineer
> 
> mapr.com

Re: more on parsing timestamps

Posted by Steven Phillips <sp...@maprtech.com>.

Could you please file a public jira. That link is to an internal issue.

On Thu, Apr 2, 2015 at 1:52 PM, Sudhakar Thota <st...@maprtech.com> wrote:

> Here it is:
>
> https://maprdrill.atlassian.net/browse/MD-204?filter=-2
>
> Thanks
> Sudhakar Thota
>
>
> On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <ae...@maprtech.com>
> wrote:
>
> > Cool, thx for testing.
> >
> > Best to file a JIRA.
> >
> > —Andries
> >
> > On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <st...@maprtech.com> wrote:
> >
> >> Vince/Andries,
> >>
> >> Perhaps this could be a bug. I get the same results.
> >>
> >> But the plan is very different, the UnionExchange is set up immediately
> after the scan operation in successful case( Case -1 ), where as
> UnionExchange is happening after scan->project (Case -2).
> >>
> >> Case -1.Successful case.
> >>
> >> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
> dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
> >> +------------+------------+
> >> |    text    |    json    |
> >> +------------+------------+
> >> | 00-00    Screen
> >> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> >> 00-02        SelectionVectorRemover
> >> 00-03          Limit(fetch=[13015351])
> >> 00-04            UnionExchange
> >> 01-01              Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
> numFiles=3, columns=[`*`]]])
> >> | {
> >> "head" : {
> >>   "version" : 1,
> >>   "generator" : {
> >>     "type" : "ExplainHandler",
> >>     "info" : ""
> >>   },
> >>   "type" : "APACHE_DRILL_PHYSICAL",
> >>   "options" : [ ],
> >>   "queue" : 0,
> >>   "resultMode" : "EXEC"
> >> },
> >>
> >> Case -2. Unsuccessful case:
> >>
> >> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
> dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
> >> +------------+------------+
> >> |    text    |    json    |
> >> +------------+------------+
> >> | 00-00    Screen
> >> 00-01      UnionExchange
> >> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> >> 01-02          Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
> numFiles=3, columns=[`*`]]])
> >> | {
> >> "head" : {
> >>   "version" : 1,
> >>   "generator" : {
> >>     "type" : "ExplainHandler",
> >>     "info" : ""
> >>   },
> >>   "type" : "APACHE_DRILL_PHYSICAL",
> >>   "options" : [ ],
> >>   "queue" : 0,
> >>   "resultMode" : "EXEC"
> >> },
> >>
> >> Thanks
> >> Sudhakar Thota
> >>
> >>
> >> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vi...@gmail.com>
> wrote:
> >>
> >>> Ok, will do. Thanks.
> >>>
> >>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
> >>> aengelbrecht@maprtech.com> wrote:
> >>>
> >>>> Compare the query plans and you probably want to look at the log file
> to
> >>>> see what fails and post here.
> >>>>
> >>>>
> >>>>
> >>>> —Andries
> >>>>
> >>>>
> >>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vince.gonzalez@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Is this a bug?
> >>>>>
> >>>>> Created a parquet table (using CTAS) with one column containing text
> >>>>> timestamps.
> >>>>>
> >>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
> >>>>> +------------+
> >>>>> |     t      |
> >>>>> +------------+
> >>>>> | 2015-01-27T13:43:53.000Z |
> >>>>> +------------+
> >>>>> 1 row selected (0.119 seconds)
> >>>>>
> >>>>> The below queries, identical apart from the limit clause, behave
> >>>>> differently. The one with the limit clause works, the one without
> >>>> doesn't.
> >>>>> The limit is larger than the total number of rows, so in both cases
> we
> >>>>> should be processing all rows.
> >>>>>
> >>>>> No limit clause. It fails:
> >>>>>
> >>>>> ```
> >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> >>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test)
> as
> >>>> t;
> >>>>> Query failed: RemoteRpcException: Failure while trying to start
> remote
> >>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
> >>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
> >>>>> 192.168.99.1:31010 ]
> >>>>> ```
> >>>>>
> >>>>> Limit clause in the subselect (larger than the number of rows in the
> >>>> table)
> >>>>> succeeds.
> >>>>>
> >>>>> ```
> >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> >>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test
> limit
> >>>>> 100000000) as t;
> >>>>> ...
> >>>>> | 2015-02-17 07:18:00.0 |
> >>>>> +------------+
> >>>>> 13,015,350 rows selected (105.257 seconds)
> >>>>> ```
> >>>>>
> >>>>> Data can be downloaded here:
> >>>>>
> >>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
> >>>>
> >>>>
> >>
> >
>
>


-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: more on parsing timestamps

Posted by Sudhakar Thota <st...@maprtech.com>.

Here it is:

https://maprdrill.atlassian.net/browse/MD-204?filter=-2

Thanks
Sudhakar Thota


On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <ae...@maprtech.com> wrote:

> Cool, thx for testing. 
> 
> Best to file a JIRA.
> 
> —Andries
> 
> On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <st...@maprtech.com> wrote:
> 
>> Vince/Andries,
>> 
>> Perhaps this could be a bug. I get the same results. 
>> 
>> But the plan is very different, the UnionExchange is set up immediately after the scan operation in successful case( Case -1 ), where as UnionExchange is happening after scan->project (Case -2).
>> 
>> Case -1.Successful case.
>> 
>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
>> +------------+------------+
>> |    text    |    json    |
>> +------------+------------+
>> | 00-00    Screen
>> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>> 00-02        SelectionVectorRemover
>> 00-03          Limit(fetch=[13015351])
>> 00-04            UnionExchange
>> 01-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
>> | {
>> "head" : {
>>   "version" : 1,
>>   "generator" : {
>>     "type" : "ExplainHandler",
>>     "info" : ""
>>   },
>>   "type" : "APACHE_DRILL_PHYSICAL",
>>   "options" : [ ],
>>   "queue" : 0,
>>   "resultMode" : "EXEC"
>> },
>> 
>> Case -2. Unsuccessful case:
>> 
>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
>> +------------+------------+
>> |    text    |    json    |
>> +------------+------------+
>> | 00-00    Screen
>> 00-01      UnionExchange
>> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>> 01-02          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
>> | {
>> "head" : {
>>   "version" : 1,
>>   "generator" : {
>>     "type" : "ExplainHandler",
>>     "info" : ""
>>   },
>>   "type" : "APACHE_DRILL_PHYSICAL",
>>   "options" : [ ],
>>   "queue" : 0,
>>   "resultMode" : "EXEC"
>> },
>> 
>> Thanks
>> Sudhakar Thota
>> 
>> 
>> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vi...@gmail.com> wrote:
>> 
>>> Ok, will do. Thanks.
>>> 
>>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>>> aengelbrecht@maprtech.com> wrote:
>>> 
>>>> Compare the query plans and you probably want to look at the log file to
>>>> see what fails and post here.
>>>> 
>>>> 
>>>> 
>>>> —Andries
>>>> 
>>>> 
>>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vi...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Is this a bug?
>>>>> 
>>>>> Created a parquet table (using CTAS) with one column containing text
>>>>> timestamps.
>>>>> 
>>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>>> +------------+
>>>>> |     t      |
>>>>> +------------+
>>>>> | 2015-01-27T13:43:53.000Z |
>>>>> +------------+
>>>>> 1 row selected (0.119 seconds)
>>>>> 
>>>>> The below queries, identical apart from the limit clause, behave
>>>>> differently. The one with the limit clause works, the one without
>>>> doesn't.
>>>>> The limit is larger than the total number of rows, so in both cases we
>>>>> should be processing all rows.
>>>>> 
>>>>> No limit clause. It fails:
>>>>> 
>>>>> ```
>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
>>>> t;
>>>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>>> 192.168.99.1:31010 ]
>>>>> ```
>>>>> 
>>>>> Limit clause in the subselect (larger than the number of rows in the
>>>> table)
>>>>> succeeds.
>>>>> 
>>>>> ```
>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>>>> 100000000) as t;
>>>>> ...
>>>>> | 2015-02-17 07:18:00.0 |
>>>>> +------------+
>>>>> 13,015,350 rows selected (105.257 seconds)
>>>>> ```
>>>>> 
>>>>> Data can be downloaded here:
>>>>> 
>>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>>> 
>>>> 
>> 
>

Re: more on parsing timestamps

Posted by Andries Engelbrecht <ae...@maprtech.com>.

Cool, thx for testing. 

Best to file a JIRA.

—Andries

On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <st...@maprtech.com> wrote:

> Vince/Andries,
> 
> Perhaps this could be a bug. I get the same results. 
> 
> But the plan is very different, the UnionExchange is set up immediately after the scan operation in successful case( Case -1 ), where as UnionExchange is happening after scan->project (Case -2).
> 
> Case -1.Successful case.
> 
> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[13015351])
> 00-04            UnionExchange
> 01-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
> | {
>  "head" : {
>    "version" : 1,
>    "generator" : {
>      "type" : "ExplainHandler",
>      "info" : ""
>    },
>    "type" : "APACHE_DRILL_PHYSICAL",
>    "options" : [ ],
>    "queue" : 0,
>    "resultMode" : "EXEC"
>  },
> 
> Case -2. Unsuccessful case:
> 
> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> 01-02          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
> | {
>  "head" : {
>    "version" : 1,
>    "generator" : {
>      "type" : "ExplainHandler",
>      "info" : ""
>    },
>    "type" : "APACHE_DRILL_PHYSICAL",
>    "options" : [ ],
>    "queue" : 0,
>    "resultMode" : "EXEC"
>  },
> 
> Thanks
> Sudhakar Thota
> 
> 
> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vi...@gmail.com> wrote:
> 
>> Ok, will do. Thanks.
>> 
>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>> aengelbrecht@maprtech.com> wrote:
>> 
>>> Compare the query plans and you probably want to look at the log file to
>>> see what fails and post here.
>>> 
>>> 
>>> 
>>> —Andries
>>> 
>>> 
>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vi...@gmail.com>
>>> wrote:
>>> 
>>>> Is this a bug?
>>>> 
>>>> Created a parquet table (using CTAS) with one column containing text
>>>> timestamps.
>>>> 
>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>> +------------+
>>>> |     t      |
>>>> +------------+
>>>> | 2015-01-27T13:43:53.000Z |
>>>> +------------+
>>>> 1 row selected (0.119 seconds)
>>>> 
>>>> The below queries, identical apart from the limit clause, behave
>>>> differently. The one with the limit clause works, the one without
>>> doesn't.
>>>> The limit is larger than the total number of rows, so in both cases we
>>>> should be processing all rows.
>>>> 
>>>> No limit clause. It fails:
>>>> 
>>>> ```
>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
>>> t;
>>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>> 192.168.99.1:31010 ]
>>>> ```
>>>> 
>>>> Limit clause in the subselect (larger than the number of rows in the
>>> table)
>>>> succeeds.
>>>> 
>>>> ```
>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>>> 100000000) as t;
>>>> ...
>>>> | 2015-02-17 07:18:00.0 |
>>>> +------------+
>>>> 13,015,350 rows selected (105.257 seconds)
>>>> ```
>>>> 
>>>> Data can be downloaded here:
>>>> 
>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>> 
>>> 
>

Re: more on parsing timestamps

Posted by Sudhakar Thota <st...@maprtech.com>.

Vince/Andries,

Perhaps this could be a bug. I get the same results. 

But the plan is very different, the UnionExchange is set up immediately after the scan operation in successful case( Case -1 ), where as UnionExchange is happening after scan->project (Case -2).

Case -1.Successful case.

0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[13015351])
00-04            UnionExchange
01-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
 | {
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "resultMode" : "EXEC"
  },

Case -2. Unsuccessful case:
  
0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
01-02          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
 | {
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "resultMode" : "EXEC"
  },
  
Thanks
Sudhakar Thota


On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vi...@gmail.com> wrote:

> Ok, will do. Thanks.
> 
> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
> 
>> Compare the query plans and you probably want to look at the log file to
>> see what fails and post here.
>> 
>> 
>> 
>> —Andries
>> 
>> 
>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vi...@gmail.com>
>> wrote:
>> 
>>> Is this a bug?
>>> 
>>> Created a parquet table (using CTAS) with one column containing text
>>> timestamps.
>>> 
>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>> +------------+
>>> |     t      |
>>> +------------+
>>> | 2015-01-27T13:43:53.000Z |
>>> +------------+
>>> 1 row selected (0.119 seconds)
>>> 
>>> The below queries, identical apart from the limit clause, behave
>>> differently. The one with the limit clause works, the one without
>> doesn't.
>>> The limit is larger than the total number of rows, so in both cases we
>>> should be processing all rows.
>>> 
>>> No limit clause. It fails:
>>> 
>>> ```
>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
>> t;
>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>> 192.168.99.1:31010 ]
>>> ```
>>> 
>>> Limit clause in the subselect (larger than the number of rows in the
>> table)
>>> succeeds.
>>> 
>>> ```
>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>> 100000000) as t;
>>> ...
>>> | 2015-02-17 07:18:00.0 |
>>> +------------+
>>> 13,015,350 rows selected (105.257 seconds)
>>> ```
>>> 
>>> Data can be downloaded here:
>>> 
>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>> 
>>

Re: more on parsing timestamps

Posted by Vince Gonzalez <vi...@gmail.com>.

Ok, will do. Thanks.

On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Compare the query plans and you probably want to look at the log file to
> see what fails and post here.
>
>
>
> —Andries
>
>
> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vi...@gmail.com>
> wrote:
>
> > Is this a bug?
> >
> > Created a parquet table (using CTAS) with one column containing text
> > timestamps.
> >
> > 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
> > +------------+
> > |     t      |
> > +------------+
> > | 2015-01-27T13:43:53.000Z |
> > +------------+
> > 1 row selected (0.119 seconds)
> >
> > The below queries, identical apart from the limit clause, behave
> > differently. The one with the limit clause works, the one without
> doesn't.
> > The limit is larger than the total number of rows, so in both cases we
> > should be processing all rows.
> >
> > No limit clause. It fails:
> >
> > ```
> > 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
> t;
> > Query failed: RemoteRpcException: Failure while trying to start remote
> > fragment, Expression has syntax error! line 1:30:mismatched input 'T'
> > expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
> > 192.168.99.1:31010 ]
> > ```
> >
> > Limit clause in the subselect (larger than the number of rows in the
> table)
> > succeeds.
> >
> > ```
> > 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
> > 100000000) as t;
> > ...
> > | 2015-02-17 07:18:00.0 |
> > +------------+
> > 13,015,350 rows selected (105.257 seconds)
> > ```
> >
> > Data can be downloaded here:
> >
> > https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>
>

Re: more on parsing timestamps

Posted by Andries Engelbrecht <ae...@maprtech.com>.

Compare the query plans and you probably want to look at the log file to see what fails and post here.



—Andries


On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vi...@gmail.com> wrote:

> Is this a bug?
> 
> Created a parquet table (using CTAS) with one column containing text
> timestamps.
> 
> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
> +------------+
> |     t      |
> +------------+
> | 2015-01-27T13:43:53.000Z |
> +------------+
> 1 row selected (0.119 seconds)
> 
> The below queries, identical apart from the limit clause, behave
> differently. The one with the limit clause works, the one without doesn't.
> The limit is larger than the total number of rows, so in both cases we
> should be processing all rows.
> 
> No limit clause. It fails:
> 
> ```
> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as t;
> Query failed: RemoteRpcException: Failure while trying to start remote
> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
> 192.168.99.1:31010 ]
> ```
> 
> Limit clause in the subselect (larger than the number of rows in the table)
> succeeds.
> 
> ```
> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
> 100000000) as t;
> ...
> | 2015-02-17 07:18:00.0 |
> +------------+
> 13,015,350 rows selected (105.257 seconds)
> ```
> 
> Data can be downloaded here:
> 
> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz