You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Richard Shaw <rs...@maprtech.com> on 2014/11/11 00:02:41 UTC
Sub-select
Hi, please can you confirm if the following type of sub-select query is
possible or how else to go about it.
Case 1:
Select …….
>From fish_data
Where “fish_id” in (select “fish_id” from ….. where ….)
Case 2:
Select …
>From fish_data fishd
Join (select … from where …..) fishtime on fishtime.fishid =
fishd.fishid
Based on an existing query
MS SQL Example:
SELECT TOP 1
fishId,
datetime AS dataStart
FROM
fish_data
WHERE
fishId = '353234025714222'
AND DATEDIFF(
dd,
(SELECT MIN(cake.datetime) AS minDate FROM
fish_data AS cake WHERE cake.fishId = '353234025714222'),
datetime
) > 5
ORDER BY
datetime ASC
Thank you
Kind Regards,
Richard
Re: Sub-select
Posted by Ted Dunning <te...@gmail.com>.
Also, there are some limitations on using joins and in queries on many data
sources.
For instance, with HBase or MapR DB, the IN will result in a full table
scan even if the left hand side of the IN is the key. Hakim is working on
a UDF to simulate the desired behavior until the HBase data source can be
extended with a magic join operator.
On Mon, Nov 10, 2014 at 6:11 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> The In subquery and Join Subquery should work on both 0.6 and 0.7. For
> example, TPCH Q18 has an IN-subquery and runs fine on both version.
>
> SELECT ...
> .....
> where
> o.o_orderkey in (
> select
> l_orderkey
> from
> cp.`tpch/lineitem.parquet`
> group by
> l_orderkey having
> sum(l_quantity) > 300
> )
> ...
>
> On the other hand, there is some known issue when IN-subquery is applied to
> complex data type column (See DRILL-1397). Are you using complex data type
> column in your IN-SUBQ? If possible, can you post a failing query example (
> modify the query to remove any sensitive information)? We would like to
> take a look at what the failing query look like.
>
>
>
>
>
>
>
> On Mon, Nov 10, 2014 at 3:24 PM, Hao Zhu <hz...@maprtech.com> wrote:
>
> > Confirmed. The same on 0.6.0.
> >
> > On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com>
> wrote:
> >
> > > Please can you confirm the subquery doesn't work on 0.6.0?
> > >
> > > Thank you
> > >
> > > On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
> > >
> > >> Both subquery should work, see below tests in 0.7:
> > >>
> > >> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> > >>> +------------+------------+
> > >>> | EXPR$0 | EXPR$1 |
> > >>> +------------+------------+
> > >>> | 1 | a |
> > >>> | 2 | b |
> > >>> | 3 | c |
> > >>> +------------+------------+
> > >>> 3 rows selected (0.082 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> > >>> +------------+------------+
> > >>> | EXPR$0 | EXPR$1 |
> > >>> +------------+------------+
> > >>> | 2 | b |
> > >>> | 3 | c |
> > >>> +------------+------------+
> > >>> 2 rows selected (0.088 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> > >>> . . . . . . . . . . . > columns[0] in (select columns[0] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> > >>> +------------+
> > >>> | EXPR$0 |
> > >>> +------------+
> > >>> | b |
> > >>> | c |
> > >>> +------------+
> > >>> 2 rows selected (0.586 seconds)
> > >>> 0: jdbc:drill:zk=local>
> > >>> 0: jdbc:drill:zk=local> select a.columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> > >>> . . . . . . . . . . . >
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> > >>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> > >>> +------------+
> > >>> | EXPR$0 |
> > >>> +------------+
> > >>> | b |
> > >>> | c |
> > >>> +------------+
> > >>> 2 rows selected (0.356 seconds)
> > >>
> > >>
> > >>
> > >> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com>
> > wrote:
> > >>
> > >>> Hi, please can you confirm if the following type of sub-select query
> is
> > >>> possible or how else to go about it.
> > >>>
> > >>> Case 1:
> > >>>
> > >>> Select …….
> > >>> From fish_data
> > >>> Where “fish_id” in (select “fish_id” from ….. where ….)
> > >>>
> > >>> Case 2:
> > >>>
> > >>> Select …
> > >>> From fish_data fishd
> > >>> Join (select … from where …..) fishtime on fishtime.fishid =
> > >>> fishd.fishid
> > >>>
> > >>> Based on an existing query
> > >>>
> > >>> MS SQL Example:
> > >>>
> > >>> SELECT TOP 1
> > >>> fishId,
> > >>> datetime AS dataStart
> > >>> FROM
> > >>> fish_data
> > >>> WHERE
> > >>> fishId = '353234025714222'
> > >>> AND DATEDIFF(
> > >>> dd,
> > >>> (SELECT MIN(cake.datetime) AS minDate FROM
> > >>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> > >>> datetime
> > >>> ) > 5
> > >>> ORDER BY
> > >>> datetime ASC
> > >>>
> > >>> Thank you
> > >>>
> > >>> Kind Regards,
> > >>> Richard
> > >>>
> > >>
> > >>
> > >
> >
>
Re: Sub-select
Posted by Jinfeng Ni <ji...@gmail.com>.
The In subquery and Join Subquery should work on both 0.6 and 0.7. For
example, TPCH Q18 has an IN-subquery and runs fine on both version.
SELECT ...
.....
where
o.o_orderkey in (
select
l_orderkey
from
cp.`tpch/lineitem.parquet`
group by
l_orderkey having
sum(l_quantity) > 300
)
...
On the other hand, there is some known issue when IN-subquery is applied to
complex data type column (See DRILL-1397). Are you using complex data type
column in your IN-SUBQ? If possible, can you post a failing query example (
modify the query to remove any sensitive information)? We would like to
take a look at what the failing query look like.
On Mon, Nov 10, 2014 at 3:24 PM, Hao Zhu <hz...@maprtech.com> wrote:
> Confirmed. The same on 0.6.0.
>
> On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com> wrote:
>
> > Please can you confirm the subquery doesn't work on 0.6.0?
> >
> > Thank you
> >
> > On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
> >
> >> Both subquery should work, see below tests in 0.7:
> >>
> >> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> >>> +------------+------------+
> >>> | EXPR$0 | EXPR$1 |
> >>> +------------+------------+
> >>> | 1 | a |
> >>> | 2 | b |
> >>> | 3 | c |
> >>> +------------+------------+
> >>> 3 rows selected (0.082 seconds)
> >>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> >>> +------------+------------+
> >>> | EXPR$0 | EXPR$1 |
> >>> +------------+------------+
> >>> | 2 | b |
> >>> | 3 | c |
> >>> +------------+------------+
> >>> 2 rows selected (0.088 seconds)
> >>> 0: jdbc:drill:zk=local> select columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> >>> . . . . . . . . . . . > columns[0] in (select columns[0] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> >>> +------------+
> >>> | EXPR$0 |
> >>> +------------+
> >>> | b |
> >>> | c |
> >>> +------------+
> >>> 2 rows selected (0.586 seconds)
> >>> 0: jdbc:drill:zk=local>
> >>> 0: jdbc:drill:zk=local> select a.columns[1] from
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> >>> . . . . . . . . . . . >
> >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> >>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> >>> +------------+
> >>> | EXPR$0 |
> >>> +------------+
> >>> | b |
> >>> | c |
> >>> +------------+
> >>> 2 rows selected (0.356 seconds)
> >>
> >>
> >>
> >> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com>
> wrote:
> >>
> >>> Hi, please can you confirm if the following type of sub-select query is
> >>> possible or how else to go about it.
> >>>
> >>> Case 1:
> >>>
> >>> Select …….
> >>> From fish_data
> >>> Where “fish_id” in (select “fish_id” from ….. where ….)
> >>>
> >>> Case 2:
> >>>
> >>> Select …
> >>> From fish_data fishd
> >>> Join (select … from where …..) fishtime on fishtime.fishid =
> >>> fishd.fishid
> >>>
> >>> Based on an existing query
> >>>
> >>> MS SQL Example:
> >>>
> >>> SELECT TOP 1
> >>> fishId,
> >>> datetime AS dataStart
> >>> FROM
> >>> fish_data
> >>> WHERE
> >>> fishId = '353234025714222'
> >>> AND DATEDIFF(
> >>> dd,
> >>> (SELECT MIN(cake.datetime) AS minDate FROM
> >>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> >>> datetime
> >>> ) > 5
> >>> ORDER BY
> >>> datetime ASC
> >>>
> >>> Thank you
> >>>
> >>> Kind Regards,
> >>> Richard
> >>>
> >>
> >>
> >
>
Re: Sub-select
Posted by Hao Zhu <hz...@maprtech.com>.
Confirmed. The same on 0.6.0.
On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rs...@maprtech.com> wrote:
> Please can you confirm the subquery doesn't work on 0.6.0?
>
> Thank you
>
> On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
>
>> Both subquery should work, see below tests in 0.7:
>>
>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
>>> +------------+------------+
>>> | EXPR$0 | EXPR$1 |
>>> +------------+------------+
>>> | 1 | a |
>>> | 2 | b |
>>> | 3 | c |
>>> +------------+------------+
>>> 3 rows selected (0.082 seconds)
>>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
>>> +------------+------------+
>>> | EXPR$0 | EXPR$1 |
>>> +------------+------------+
>>> | 2 | b |
>>> | 3 | c |
>>> +------------+------------+
>>> 2 rows selected (0.088 seconds)
>>> 0: jdbc:drill:zk=local> select columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
>>> . . . . . . . . . . . > columns[0] in (select columns[0] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
>>> +------------+
>>> | EXPR$0 |
>>> +------------+
>>> | b |
>>> | c |
>>> +------------+
>>> 2 rows selected (0.586 seconds)
>>> 0: jdbc:drill:zk=local>
>>> 0: jdbc:drill:zk=local> select a.columns[1] from
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
>>> . . . . . . . . . . . >
>>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
>>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
>>> +------------+
>>> | EXPR$0 |
>>> +------------+
>>> | b |
>>> | c |
>>> +------------+
>>> 2 rows selected (0.356 seconds)
>>
>>
>>
>> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:
>>
>>> Hi, please can you confirm if the following type of sub-select query is
>>> possible or how else to go about it.
>>>
>>> Case 1:
>>>
>>> Select …….
>>> From fish_data
>>> Where “fish_id” in (select “fish_id” from ….. where ….)
>>>
>>> Case 2:
>>>
>>> Select …
>>> From fish_data fishd
>>> Join (select … from where …..) fishtime on fishtime.fishid =
>>> fishd.fishid
>>>
>>> Based on an existing query
>>>
>>> MS SQL Example:
>>>
>>> SELECT TOP 1
>>> fishId,
>>> datetime AS dataStart
>>> FROM
>>> fish_data
>>> WHERE
>>> fishId = '353234025714222'
>>> AND DATEDIFF(
>>> dd,
>>> (SELECT MIN(cake.datetime) AS minDate FROM
>>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
>>> datetime
>>> ) > 5
>>> ORDER BY
>>> datetime ASC
>>>
>>> Thank you
>>>
>>> Kind Regards,
>>> Richard
>>>
>>
>>
>
Re: Sub-select
Posted by Richard Shaw <rs...@maprtech.com>.
Please can you confirm the subquery doesn't work on 0.6.0?
Thank you
On 10 November 2014 23:12, Hao Zhu <hz...@maprtech.com> wrote:
> Both subquery should work, see below tests in 0.7:
>
> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
>> +------------+------------+
>> | EXPR$0 | EXPR$1 |
>> +------------+------------+
>> | 1 | a |
>> | 2 | b |
>> | 3 | c |
>> +------------+------------+
>> 3 rows selected (0.082 seconds)
>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
>> +------------+------------+
>> | EXPR$0 | EXPR$1 |
>> +------------+------------+
>> | 2 | b |
>> | 3 | c |
>> +------------+------------+
>> 2 rows selected (0.088 seconds)
>> 0: jdbc:drill:zk=local> select columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
>> . . . . . . . . . . . > columns[0] in (select columns[0] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
>> +------------+
>> | EXPR$0 |
>> +------------+
>> | b |
>> | c |
>> +------------+
>> 2 rows selected (0.586 seconds)
>> 0: jdbc:drill:zk=local>
>> 0: jdbc:drill:zk=local> select a.columns[1] from
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
>> . . . . . . . . . . . >
>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
>> +------------+
>> | EXPR$0 |
>> +------------+
>> | b |
>> | c |
>> +------------+
>> 2 rows selected (0.356 seconds)
>
>
>
> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:
>
>> Hi, please can you confirm if the following type of sub-select query is
>> possible or how else to go about it.
>>
>> Case 1:
>>
>> Select …….
>> From fish_data
>> Where “fish_id” in (select “fish_id” from ….. where ….)
>>
>> Case 2:
>>
>> Select …
>> From fish_data fishd
>> Join (select … from where …..) fishtime on fishtime.fishid =
>> fishd.fishid
>>
>> Based on an existing query
>>
>> MS SQL Example:
>>
>> SELECT TOP 1
>> fishId,
>> datetime AS dataStart
>> FROM
>> fish_data
>> WHERE
>> fishId = '353234025714222'
>> AND DATEDIFF(
>> dd,
>> (SELECT MIN(cake.datetime) AS minDate FROM
>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
>> datetime
>> ) > 5
>> ORDER BY
>> datetime ASC
>>
>> Thank you
>>
>> Kind Regards,
>> Richard
>>
>
>
Re: Sub-select
Posted by Hao Zhu <hz...@maprtech.com>.
Both subquery should work, see below tests in 0.7:
0: jdbc:drill:zk=local> select columns[0],columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> +------------+------------+
> | EXPR$0 | EXPR$1 |
> +------------+------------+
> | 1 | a |
> | 2 | b |
> | 3 | c |
> +------------+------------+
> 3 rows selected (0.082 seconds)
> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> +------------+------------+
> | EXPR$0 | EXPR$1 |
> +------------+------------+
> | 2 | b |
> | 3 | c |
> +------------+------------+
> 2 rows selected (0.088 seconds)
> 0: jdbc:drill:zk=local> select columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> . . . . . . . . . . . > columns[0] in (select columns[0] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> +------------+
> | EXPR$0 |
> +------------+
> | b |
> | c |
> +------------+
> 2 rows selected (0.586 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> select a.columns[1] from
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> . . . . . . . . . . . >
> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> +------------+
> | EXPR$0 |
> +------------+
> | b |
> | c |
> +------------+
> 2 rows selected (0.356 seconds)
On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rs...@maprtech.com> wrote:
> Hi, please can you confirm if the following type of sub-select query is
> possible or how else to go about it.
>
> Case 1:
>
> Select …….
> From fish_data
> Where “fish_id” in (select “fish_id” from ….. where ….)
>
> Case 2:
>
> Select …
> From fish_data fishd
> Join (select … from where …..) fishtime on fishtime.fishid =
> fishd.fishid
>
> Based on an existing query
>
> MS SQL Example:
>
> SELECT TOP 1
> fishId,
> datetime AS dataStart
> FROM
> fish_data
> WHERE
> fishId = '353234025714222'
> AND DATEDIFF(
> dd,
> (SELECT MIN(cake.datetime) AS minDate FROM
> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> datetime
> ) > 5
> ORDER BY
> datetime ASC
>
> Thank you
>
> Kind Regards,
> Richard
>