You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Jim Bates <jb...@maprtech.com> on 2014/11/25 05:00:02 UTC

6 to 7 min delay in closing query when pulling over multiple json files using drill-0.6.0.28642.r2-1.noarch

When executing a query to a specific file and limiting to 1 the query
returns in under a second:
select * FROM (select `dir0` as `city`, to_timestamp(
`executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
flatten(`stationBeanList`) as `stations` FROM
 `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
1) a limit 1;
+------------+---------------+------------+
|    city    | executionTime |  stations  |
+------------+---------------+------------+
| null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
Harrison
St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison
St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
St.","altitude":"","testStation":false,"landMark":"030"} |
+------------+---------------+------------+
1 row selected (0.567 seconds)

When executing over a larger scope it returns the first row in 3 sec but
does not close the query for another 6 or 7 minuets:
select * FROM (select `dir0` as `city`, to_timestamp(
`executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
flatten(`stationBeanList`) as `stations` FROM
 `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
+------------+---------------+------------+
|    city    | executionTime |  stations  |
+------------+---------------+------------+
| chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
Harrison
St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- At
this point in 3 sec*
+------------+---------------+------------+
1 row selected (496.05 seconds)

Any reason that might be?

Re: 6 to 7 min delay in closing query when pulling over multiple json files using drill-0.6.0.28642.r2-1.noarch

Posted by Suresh Ollala <so...@maprtech.com>.
You might be hitting Drill-1681

On Tue, Nov 25, 2014 at 7:02 PM, Jim Bates <jb...@maprtech.com> wrote:

> Didn't get a hit on this so I'm sending it for round 2...
>
> When executing a query to a specific file and limiting to 1 row returned
> the query returns in under a second. When keeping the same limit but
> increasing the scope to several directories of JSON files it returns the
> single row quickly but can take up to 7 to 10 min to "finish". That delay
> forces one to configure a timeout of 600 to 1200 sec in the ODBC connector
> or the query will fail.
>
> Any workarounds for this?
>
> Query to a single file:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
> 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
> Harrison
> St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison
> St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
> St.","altitude":"","testStation":false,"landMark":"030"} |
> +------------+---------------+------------+
> 1 row selected (0.542 seconds)
>
> When executing over a larger scope it returns the first row in 3 sec but
> does not close the query for another 6 or 7 minuets:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
> Harrison
> St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
> Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
> State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- At
> this point in 3 sec*
> +------------+---------------+------------+
> 1 row selected (683.15 seconds)
>
>
> On Mon, Nov 24, 2014 at 10:00 PM, Jim Bates <jb...@maprtech.com> wrote:
>
> > When executing a query to a specific file and limiting to 1 the query
> > returns in under a second:
> > select * FROM (select `dir0` as `city`, to_timestamp(
> > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> > flatten(`stationBeanList`) as `stations` FROM
> >  `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
> > 1) a limit 1;
> > +------------+---------------+------------+
> > |    city    | executionTime |  stations  |
> > +------------+---------------+------------+
> > | null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
> > Harrison
> St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> > Service","statusKey":1,"availableBikes":7,"stAddress1":"State St &
> Harrison
> > St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
> > St.","altitude":"","testStation":false,"landMark":"030"} |
> > +------------+---------------+------------+
> > 1 row selected (0.567 seconds)
> >
> > When executing over a larger scope it returns the first row in 3 sec but
> > does not close the query for another 6 or 7 minuets:
> > select * FROM (select `dir0` as `city`, to_timestamp(
> > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> > flatten(`stationBeanList`) as `stations` FROM
> >  `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
> > +------------+---------------+------------+
> > |    city    | executionTime |  stations  |
> > +------------+---------------+------------+
> > | chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
> > Harrison
> St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> > Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
> > Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
> > State St.","altitude":"","testStation":false,"landMark":"030"} | * <---
> > At this point in 3 sec*
> > +------------+---------------+------------+
> > 1 row selected (496.05 seconds)
> >
> > Any reason that might be?
> >
> >
>

Re: 6 to 7 min delay in closing query when pulling over multiple json files using drill-0.6.0.28642.r2-1.noarch

Posted by Jim Bates <jb...@maprtech.com>.
Didn't get a hit on this so I'm sending it for round 2...

When executing a query to a specific file and limiting to 1 row returned
the query returns in under a second. When keeping the same limit but
increasing the scope to several directories of JSON files it returns the
single row quickly but can take up to 7 to 10 min to "finish". That delay
forces one to configure a timeout of 600 to 1200 sec in the ODBC connector
or the query will fail.

Any workarounds for this?

Query to a single file:
select * FROM (select `dir0` as `city`, to_timestamp(
`executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
flatten(`stationBeanList`) as `stations` FROM
 `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
1) a limit 1;
+------------+---------------+------------+
|    city    | executionTime |  stations  |
+------------+---------------+------------+
| null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
Harrison St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison
St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
St.","altitude":"","testStation":false,"landMark":"030"} |
+------------+---------------+------------+
1 row selected (0.542 seconds)

When executing over a larger scope it returns the first row in 3 sec but
does not close the query for another 6 or 7 minuets:
select * FROM (select `dir0` as `city`, to_timestamp(
`executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
flatten(`stationBeanList`) as `stations` FROM
 `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
+------------+---------------+------------+
|    city    | executionTime |  stations  |
+------------+---------------+------------+
| chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
Harrison St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- At
this point in 3 sec*
+------------+---------------+------------+
1 row selected (683.15 seconds)


On Mon, Nov 24, 2014 at 10:00 PM, Jim Bates <jb...@maprtech.com> wrote:

> When executing a query to a specific file and limiting to 1 the query
> returns in under a second:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
> 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
> Harrison St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison
> St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
> St.","altitude":"","testStation":false,"landMark":"030"} |
> +------------+---------------+------------+
> 1 row selected (0.567 seconds)
>
> When executing over a larger scope it returns the first row in 3 sec but
> does not close the query for another 6 or 7 minuets:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
> Harrison St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
> Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
> State St.","altitude":"","testStation":false,"landMark":"030"} | * <---
> At this point in 3 sec*
> +------------+---------------+------------+
> 1 row selected (496.05 seconds)
>
> Any reason that might be?
>
>