You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Richard Shaw <rs...@maprtech.com> on 2014/11/11 00:02:41 UTC

Sub-select

Hi, please can you confirm if the following type of sub-select query is
possible or how else to go about it.

Case 1:

Select …….
>From fish_data
Where  “fish_id”  in (select  “fish_id”  from ….. where ….)

Case 2:

Select  …
>From fish_data  fishd
Join (select … from  where …..)  fishtime   on fishtime.fishid =
fishd.fishid

Based on an existing query

MS SQL Example:

SELECT TOP 1
            fishId,
            datetime AS dataStart
FROM
            fish_data
WHERE
            fishId = '353234025714222'
            AND DATEDIFF(
                         dd,
                         (SELECT MIN(cake.datetime) AS minDate FROM
fish_data AS cake WHERE cake.fishId = '353234025714222'),
                         datetime
            ) > 5
ORDER BY
            datetime ASC

Thank you

Kind Regards,
Richard

Re: Sub-select

Posted by Ted Dunning <te...@gmail.com>.
Also, there are some limitations on using joins and in queries on many data
sources.

For instance, with HBase or MapR DB, the IN will result in a full table
scan even if the left hand side of the IN is the key.  Hakim is working on
a UDF to simulate the desired behavior until the HBase data source can be
extended with a magic join operator.



On Mon, Nov 10, 2014 at 6:11 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> The In subquery and Join Subquery should work on both 0.6 and 0.7.  For
> example, TPCH Q18 has an IN-subquery and runs fine on both version.
>
> SELECT ...
> .....
> where
>   o.o_orderkey in (
>     select
>       l_orderkey
>     from
>       cp.`tpch/lineitem.parquet`
>     group by
>       l_orderkey having
>         sum(l_quantity) > 300
>   )
> ...
>
> On the other hand, there is some known issue when IN-subquery is applied to
> complex data type column (See DRILL-1397).  Are you using complex data type
> column in your IN-SUBQ? If possible, can you post a failing query example (
> modify the query to remove any sensitive information)?  We would like to
> take a look at what the failing query look like.
>
>
>
>
>
>
>
> On Mon, Nov 10, 2014 at 3:24 PM, Hao Zhu <hz...@maprtech.com> wrote:
>
> > Confirmed. The same on 0.6.0.
> >
> > On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com>
> wrote:
> >
> > > Please can you confirm the subquery doesn't work on 0.6.0?
> > >
> > > Thank you
> > >
> > > On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
> > >
> > >> Both subquery should work, see below tests in 0.7:
> > >>
> > >> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> > >>> +------------+------------+
> > >>> |   EXPR$0   |   EXPR$1   |
> > >>> +------------+------------+
> > >>> | 1          | a          |
> > >>> | 2          | b          |
> > >>> | 3          | c          |
> > >>> +------------+------------+
> > >>> 3 rows selected (0.082 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> > >>> +------------+------------+
> > >>> |   EXPR$0   |   EXPR$1   |
> > >>> +------------+------------+
> > >>> | 2          | b          |
> > >>> | 3          | c          |
> > >>> +------------+------------+
> > >>> 2 rows selected (0.088 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> > >>> . . . . . . . . . . . > columns[0] in (select columns[0] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | b          |
> > >>> | c          |
> > >>> +------------+
> > >>> 2 rows selected (0.586 seconds)
> > >>> 0: jdbc:drill:zk=local>
> > >>> 0: jdbc:drill:zk=local> select a.columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> > >>> . . . . . . . . . . . >
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> > >>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | b          |
> > >>> | c          |
> > >>> +------------+
> > >>> 2 rows selected (0.356 seconds)
> > >>
> > >>
> > >>
> > >> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com>
> > wrote:
> > >>
> > >>> Hi, please can you confirm if the following type of sub-select query
> is
> > >>> possible or how else to go about it.
> > >>>
> > >>> Case 1:
> > >>>
> > >>> Select …….
> > >>> From fish_data
> > >>> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
> > >>>
> > >>> Case 2:
> > >>>
> > >>> Select  …
> > >>> From fish_data  fishd
> > >>> Join (select … from  where …..)  fishtime   on fishtime.fishid =
> > >>> fishd.fishid
> > >>>
> > >>> Based on an existing query
> > >>>
> > >>> MS SQL Example:
> > >>>
> > >>> SELECT TOP 1
> > >>>             fishId,
> > >>>             datetime AS dataStart
> > >>> FROM
> > >>>             fish_data
> > >>> WHERE
> > >>>             fishId = '353234025714222'
> > >>>             AND DATEDIFF(
> > >>>                          dd,
> > >>>                          (SELECT MIN(cake.datetime) AS minDate FROM
> > >>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> > >>>                          datetime
> > >>>             ) > 5
> > >>> ORDER BY
> > >>>             datetime ASC
> > >>>
> > >>> Thank you
> > >>>
> > >>> Kind Regards,
> > >>> Richard
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Sub-select

Posted by Jinfeng Ni <ji...@gmail.com>.
The In subquery and Join Subquery should work on both 0.6 and 0.7.  For
example, TPCH Q18 has an IN-subquery and runs fine on both version.

SELECT ...
.....
where
  o.o_orderkey in (
    select
      l_orderkey
    from
      cp.`tpch/lineitem.parquet`
    group by
      l_orderkey having
        sum(l_quantity) > 300
  )
...

On the other hand, there is some known issue when IN-subquery is applied to
complex data type column (See DRILL-1397).  Are you using complex data type
column in your IN-SUBQ? If possible, can you post a failing query example (
modify the query to remove any sensitive information)?  We would like to
take a look at what the failing query look like.







On Mon, Nov 10, 2014 at 3:24 PM, Hao Zhu <hz...@maprtech.com> wrote:

> Confirmed. The same on 0.6.0.
>
> On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com> wrote:
>
> > Please can you confirm the subquery doesn't work on 0.6.0?
> >
> > Thank you
> >
> > On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
> >
> >> Both subquery should work, see below tests in 0.7:
> >>
> >> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> >>> +------------+------------+
> >>> |   EXPR$0   |   EXPR$1   |
> >>> +------------+------------+
> >>> | 1          | a          |
> >>> | 2          | b          |
> >>> | 3          | c          |
> >>> +------------+------------+
> >>> 3 rows selected (0.082 seconds)
> >>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> >>> +------------+------------+
> >>> |   EXPR$0   |   EXPR$1   |
> >>> +------------+------------+
> >>> | 2          | b          |
> >>> | 3          | c          |
> >>> +------------+------------+
> >>> 2 rows selected (0.088 seconds)
> >>> 0: jdbc:drill:zk=local> select columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> >>> . . . . . . . . . . . > columns[0] in (select columns[0] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> >>> +------------+
> >>> |   EXPR$0   |
> >>> +------------+
> >>> | b          |
> >>> | c          |
> >>> +------------+
> >>> 2 rows selected (0.586 seconds)
> >>> 0: jdbc:drill:zk=local>
> >>> 0: jdbc:drill:zk=local> select a.columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> >>> . . . . . . . . . . . >
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> >>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> >>> +------------+
> >>> |   EXPR$0   |
> >>> +------------+
> >>> | b          |
> >>> | c          |
> >>> +------------+
> >>> 2 rows selected (0.356 seconds)
> >>
> >>
> >>
> >> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com>
> wrote:
> >>
> >>> Hi, please can you confirm if the following type of sub-select query is
> >>> possible or how else to go about it.
> >>>
> >>> Case 1:
> >>>
> >>> Select …….
> >>> From fish_data
> >>> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
> >>>
> >>> Case 2:
> >>>
> >>> Select  …
> >>> From fish_data  fishd
> >>> Join (select … from  where …..)  fishtime   on fishtime.fishid =
> >>> fishd.fishid
> >>>
> >>> Based on an existing query
> >>>
> >>> MS SQL Example:
> >>>
> >>> SELECT TOP 1
> >>>             fishId,
> >>>             datetime AS dataStart
> >>> FROM
> >>>             fish_data
> >>> WHERE
> >>>             fishId = '353234025714222'
> >>>             AND DATEDIFF(
> >>>                          dd,
> >>>                          (SELECT MIN(cake.datetime) AS minDate FROM
> >>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> >>>                          datetime
> >>>             ) > 5
> >>> ORDER BY
> >>>             datetime ASC
> >>>
> >>> Thank you
> >>>
> >>> Kind Regards,
> >>> Richard
> >>>
> >>
> >>
> >
>

Re: Sub-select

Posted by Hao Zhu <hz...@maprtech.com>.
Confirmed. The same on 0.6.0.

On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com> wrote:

> Please can you confirm the subquery doesn't work on 0.6.0?
>
> Thank you
>
> On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
>
>> Both subquery should work, see below tests in 0.7:
>>
>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
>>> +------------+------------+
>>> |   EXPR$0   |   EXPR$1   |
>>> +------------+------------+
>>> | 1          | a          |
>>> | 2          | b          |
>>> | 3          | c          |
>>> +------------+------------+
>>> 3 rows selected (0.082 seconds)
>>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
>>> +------------+------------+
>>> |   EXPR$0   |   EXPR$1   |
>>> +------------+------------+
>>> | 2          | b          |
>>> | 3          | c          |
>>> +------------+------------+
>>> 2 rows selected (0.088 seconds)
>>> 0: jdbc:drill:zk=local> select columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
>>> . . . . . . . . . . . > columns[0] in (select columns[0] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | b          |
>>> | c          |
>>> +------------+
>>> 2 rows selected (0.586 seconds)
>>> 0: jdbc:drill:zk=local>
>>> 0: jdbc:drill:zk=local> select a.columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
>>> . . . . . . . . . . . >
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
>>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | b          |
>>> | c          |
>>> +------------+
>>> 2 rows selected (0.356 seconds)
>>
>>
>>
>> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:
>>
>>> Hi, please can you confirm if the following type of sub-select query is
>>> possible or how else to go about it.
>>>
>>> Case 1:
>>>
>>> Select …….
>>> From fish_data
>>> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
>>>
>>> Case 2:
>>>
>>> Select  …
>>> From fish_data  fishd
>>> Join (select … from  where …..)  fishtime   on fishtime.fishid =
>>> fishd.fishid
>>>
>>> Based on an existing query
>>>
>>> MS SQL Example:
>>>
>>> SELECT TOP 1
>>>             fishId,
>>>             datetime AS dataStart
>>> FROM
>>>             fish_data
>>> WHERE
>>>             fishId = '353234025714222'
>>>             AND DATEDIFF(
>>>                          dd,
>>>                          (SELECT MIN(cake.datetime) AS minDate FROM
>>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
>>>                          datetime
>>>             ) > 5
>>> ORDER BY
>>>             datetime ASC
>>>
>>> Thank you
>>>
>>> Kind Regards,
>>> Richard
>>>
>>
>>
>

Re: Sub-select

Posted by Richard Shaw <rs...@maprtech.com>.
Please can you confirm the subquery doesn't work on 0.6.0?

Thank you

On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:

> Both subquery should work, see below tests in 0.7:
>
> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
>> +------------+------------+
>> |   EXPR$0   |   EXPR$1   |
>> +------------+------------+
>> | 1          | a          |
>> | 2          | b          |
>> | 3          | c          |
>> +------------+------------+
>> 3 rows selected (0.082 seconds)
>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
>> +------------+------------+
>> |   EXPR$0   |   EXPR$1   |
>> +------------+------------+
>> | 2          | b          |
>> | 3          | c          |
>> +------------+------------+
>> 2 rows selected (0.088 seconds)
>> 0: jdbc:drill:zk=local> select columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
>> . . . . . . . . . . . > columns[0] in (select columns[0] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | b          |
>> | c          |
>> +------------+
>> 2 rows selected (0.586 seconds)
>> 0: jdbc:drill:zk=local>
>> 0: jdbc:drill:zk=local> select a.columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
>> . . . . . . . . . . . >
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | b          |
>> | c          |
>> +------------+
>> 2 rows selected (0.356 seconds)
>
>
>
> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:
>
>> Hi, please can you confirm if the following type of sub-select query is
>> possible or how else to go about it.
>>
>> Case 1:
>>
>> Select …….
>> From fish_data
>> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
>>
>> Case 2:
>>
>> Select  …
>> From fish_data  fishd
>> Join (select … from  where …..)  fishtime   on fishtime.fishid =
>> fishd.fishid
>>
>> Based on an existing query
>>
>> MS SQL Example:
>>
>> SELECT TOP 1
>>             fishId,
>>             datetime AS dataStart
>> FROM
>>             fish_data
>> WHERE
>>             fishId = '353234025714222'
>>             AND DATEDIFF(
>>                          dd,
>>                          (SELECT MIN(cake.datetime) AS minDate FROM
>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
>>                          datetime
>>             ) > 5
>> ORDER BY
>>             datetime ASC
>>
>> Thank you
>>
>> Kind Regards,
>> Richard
>>
>
>

Re: Sub-select

Posted by Hao Zhu <hz...@maprtech.com>.
Both subquery should work, see below tests in 0.7:

0: jdbc:drill:zk=local> select columns[0],columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> +------------+------------+
> |   EXPR$0   |   EXPR$1   |
> +------------+------------+
> | 1          | a          |
> | 2          | b          |
> | 3          | c          |
> +------------+------------+
> 3 rows selected (0.082 seconds)
> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> +------------+------------+
> |   EXPR$0   |   EXPR$1   |
> +------------+------------+
> | 2          | b          |
> | 3          | c          |
> +------------+------------+
> 2 rows selected (0.088 seconds)
> 0: jdbc:drill:zk=local> select columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> . . . . . . . . . . . > columns[0] in (select columns[0] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> +------------+
> |   EXPR$0   |
> +------------+
> | b          |
> | c          |
> +------------+
> 2 rows selected (0.586 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> select a.columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> . . . . . . . . . . . >
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> +------------+
> |   EXPR$0   |
> +------------+
> | b          |
> | c          |
> +------------+
> 2 rows selected (0.356 seconds)



On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:

> Hi, please can you confirm if the following type of sub-select query is
> possible or how else to go about it.
>
> Case 1:
>
> Select …….
> From fish_data
> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
>
> Case 2:
>
> Select  …
> From fish_data  fishd
> Join (select … from  where …..)  fishtime   on fishtime.fishid =
> fishd.fishid
>
> Based on an existing query
>
> MS SQL Example:
>
> SELECT TOP 1
>             fishId,
>             datetime AS dataStart
> FROM
>             fish_data
> WHERE
>             fishId = '353234025714222'
>             AND DATEDIFF(
>                          dd,
>                          (SELECT MIN(cake.datetime) AS minDate FROM
> fish_data AS cake WHERE cake.fishId = '353234025714222'),
>                          datetime
>             ) > 5
> ORDER BY
>             datetime ASC
>
> Thank you
>
> Kind Regards,
> Richard
>