You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Rahul Raj <ra...@option3.io> on 2018/10/12 12:19:06 UTC

Drill JDBC Plugin limit queries

Hi,

Drill does not push the LIMIT queries to external databases and I assume it
could be more related to Calcite. This leads to out of memory situations
while querying large table to view few records.  Is there something that
could be improved here? One solutions would be to push filters down to the
DB and/or combined with some JDBC batch size limit to flush a part as
parquet.

Regards,
Rahul

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

Rahul,

*double rows* is an estimate row count, which can be used in choosing right
Join operator, maybe somewhere else.
But to have a proper *PushLimitIntoScan *it is necessary to change *String*
*sql*.
Possibly it is necessary to keep *JdbcImplementor *or* JdbcImplementor.Result
*from* JdbcPrel*in class in *JdbcGroupScan *class
and change the sqlNode in the *applyLimit() *method.

Not sure why *DrillPushLimitToScanRule *is not matched. Is it added to the
planner program?
To find the reason of it you can compare the flow with Parquet Scan, for
instance.


On Fri, Oct 19, 2018 at 7:37 PM Rahul Raj <ra...@option3.io> wrote:

> Vitalii,
>
> I made both the changes, it did not work and a full scan was issued as
> shown in the plan below.
>
> 00-00    Screen : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
> first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
> = 5.0, cumulative cost = {120.5 rows, 165.5 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 227
> 00-01      Project(actor_id=[$0], first_name=[$1], last_name=[$2],
> last_update=[$3]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
> first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
> = 5.0, cumulative cost = {120.0 rows, 165.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 226
> 00-02        SelectionVectorRemover : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {115.0 rows, 145.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 225
> 00-03          Limit(fetch=[5]) : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {110.0 rows, 140.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 224
> 00-04            Limit(fetch=[5]) : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {105.0 rows, 120.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 223
> 00-05              Jdbc(sql=[SELECT * FROM "public"."actor" ]) :
> rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name,
> VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 100.0,
> cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 164
>
> Regards,
> Rahul
>
> On Fri, Oct 19, 2018 at 8:47 PM Rahul Raj <ra...@option3.io> wrote:
>
> > I will make the changes and update you.
> >
> > Regards,
> > Rahul
> >
> > On Fri, Oct 19, 2018 at 1:05 AM Vitalii Diravka <vi...@apache.org>
> > wrote:
> >
> >> Rahul,
> >>
> >> Possibly *JdbcGroupScan* can be improved, for instance by overriding
> >> *supportsLimitPushdown()* and *applyLimit()* methods,
> >> *double rows *field can be updated by the limit value.
> >>
> >> I've performed the following query: select * from mysql.`testdb`.`table`
> >> limit 2;
> >> but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
> >>
> >>
> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
> >> So it is definitely should be improved.
> >>
> >> *Note:* Changed mailing list to devs.
> >>
> >> On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:
> >>
> >> > Vitalii,
> >> >
> >> > Created documentation ticket DRILL-6794
> >> >
> >> > How do we proceed on extending the scan operators to support JDBC
> >> plugins?
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
> >> > wrote:
> >> >
> >> > > To update the documentation, since that issues were solved by using
> >> these
> >> > > properties in connection URL:
> >> > > defaultRowFetchSize=10000  [1]
> >> > > defaultAutoCommit=false    [2]
> >> > > The full URL was there "url": "jdbc:postgresql://
> >> > >
> >> > >
> >> >
> >>
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> >> > > "
> >> > >
> >> > > If some issues are still present, it is also reasonable to create
> >> tickets
> >> > > to track them.
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> >> > > [2]
> >> > >
> >> > >
> >> >
> >>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> >> > >
> >> > > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io>
> >> wrote:
> >> > >
> >> > > > Should I create tickets to track these issues or should I create a
> >> > ticket
> >> > > > to update the documentation?
> >> > > >
> >> > > > Rahul
> >> > > >
> >> > > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <
> vitalii@apache.org
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > 1. You are right, it means it is reasonable to extend this rule
> >> for
> >> > > > > applying on other Scan operators (or possibly to create the
> >> separate
> >> > > > one).
> >> > > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> >> > please
> >> > > > take
> >> > > > > a look [1].
> >> > > > >     Since you are trying to setup this configs, It will be good,
> >> if
> >> > you
> >> > > > > create a Jira ticket to add this info to Drill docs [2]
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > >
> >>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> >> > > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> >> > > > >
> >> > > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <rahul.raj@option3.io
> >
> >> > > wrote:
> >> > > > >
> >> > > > > > Regarding the heap out of error, it could be that the jdbc
> >> driver
> >> > is
> >> > > > > > prefetching the entire record set to memory. I just had a look
> >> at
> >> > > > > > JdbcRecordReader, looks like by setting
> >> > connection#autoCommit(false)
> >> > > > and
> >> > > > > a
> >> > > > > > sufficient fetch size we could force the driver to stream data
> >> as
> >> > > > > required.
> >> > > > > > This is how postgres driver works.
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> >> > > > > >
> >> > > > > > We will have to see the behaviour of other drivers too.
> >> > > > > >
> >> > > > > > Let me know your thoughts here.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Rahul
> >> > > > > >
> >> > > > > >
> >> > > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <
> rahul.raj@option3.io
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Vitalii,
> >> > > > > > >
> >> > > > > > > There are two concrete implementations of the class -
> >> > > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> >> > > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> >> > > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies
> >> to
> >> > > > > Parquet.
> >> > > > > > > And pushdown only apply limit but not offset"
> >> > > > > > >
> >> > > > > > > Also I enabled debug mode and found LIMIT is not getting
> >> pushed
> >> > to
> >> > > > the
> >> > > > > > > query.
> >> > > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost
> =
> >> > {83.0
> >> > > > > rows,
> >> > > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> >> > > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost =
> >> {72.0
> >> > > > rows,
> >> > > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> >> > > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative
> >> cost =
> >> > > > {61.0
> >> > > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> >> > > > > > >           JdbcPrel(sql=[SELECT * FROM
> >> > "u_g001"."executioniteration"
> >> > > > > WHERE
> >> > > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost =
> {50.0
> >> > rows,
> >> > > > > 50.0
> >> > > > > > cpu
> >> > > > > > >
> >> > > > > > > Regarding the second point, its the java heap getting filled
> >> with
> >> > > > jdbc
> >> > > > > > > results. How do we address this?
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Rahul
> >> > > > > > >
> >> > > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> >> > > vitalii@apache.org>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Hi Rahul,
> >> > > > > > >>
> >> > > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should
> >> do
> >> > > this
> >> > > > > > >> optimization, whether the GroupScan supports Limit Push
> Down.
> >> > > > > > >> Also you can verify in debug mode whether this rule is
> fired.
> >> > > > > > >> Possibly for some external DB (like MapR-DB) Drill should
> >> have
> >> > the
> >> > > > > > >> separate
> >> > > > > > >> class for this optimization [2].
> >> > > > > > >>
> >> > > > > > >> [1]
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> >> > > > > > >> [2]
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> >> > > > > > >>
> >> > > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <
> >> rahul.raj@option3.io
> >> > >
> >> > > > > wrote:
> >> > > > > > >>
> >> > > > > > >> > Hi,
> >> > > > > > >> >
> >> > > > > > >> > Drill does not push the LIMIT queries to external
> databases
> >> > and
> >> > > I
> >> > > > > > >> assume it
> >> > > > > > >> > could be more related to Calcite. This leads to out of
> >> memory
> >> > > > > > situations
> >> > > > > > >> > while querying large table to view few records.  Is there
> >> > > > something
> >> > > > > > that
> >> > > > > > >> > could be improved here? One solutions would be to push
> >> filters
> >> > > > down
> >> > > > > to
> >> > > > > > >> the
> >> > > > > > >> > DB and/or combined with some JDBC batch size limit to
> >> flush a
> >> > > part
> >> > > > > as
> >> > > > > > >> > parquet.
> >> > > > > > >> >
> >> > > > > > >> > Regards,
> >> > > > > > >> > Rahul
> >> > > > > > >> >
> >> > > > > > >> > --
> >> > > > > > >> > _*** This email and any files transmitted with it are
> >> > > confidential
> >> > > > > and
> >> > > > > > >> > intended solely for the use of the individual or entity
> to
> >> > whom
> >> > > it
> >> > > > > is
> >> > > > > > >> > addressed. If you are not the named addressee then you
> >> should
> >> > > not
> >> > > > > > >> > disseminate, distribute or copy this e-mail. Please
> notify
> >> the
> >> > > > > sender
> >> > > > > > >> > immediately and delete this e-mail from your system.***_
> >> > > > > > >> >
> >> > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > _*** This email and any files transmitted with it are
> >> confidential
> >> > > and
> >> > > > > > intended solely for the use of the individual or entity to
> whom
> >> it
> >> > is
> >> > > > > > addressed. If you are not the named addressee then you should
> >> not
> >> > > > > > disseminate, distribute or copy this e-mail. Please notify the
> >> > sender
> >> > > > > > immediately and delete this e-mail from your system.***_
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > > --
> >> > > > _*** This email and any files transmitted with it are confidential
> >> and
> >> > > > intended solely for the use of the individual or entity to whom it
> >> is
> >> > > > addressed. If you are not the named addressee then you should not
> >> > > > disseminate, distribute or copy this e-mail. Please notify the
> >> sender
> >> > > > immediately and delete this e-mail from your system.***_
> >> > > >
> >> > >
> >> >
> >> > --
> >> > _*** This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed. If you are not the named addressee then you should not
> >> > disseminate, distribute or copy this e-mail. Please notify the sender
> >> > immediately and delete this e-mail from your system.***_
> >> >
> >>
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Vitalii,

I tried debugging this and the query received at
JdbcGroupScan#getSpecificScan is without the limit clause - "SELECT * FROM
public.actor", so the decision to push is made before itself. Debugger did
not hit the rules of DrillPushLimitToScanRule too.

Any pointers?

Regards,
Rahul

On Fri, Oct 19, 2018 at 10:07 PM Rahul Raj <ra...@option3.io> wrote:

> Vitalii,
>
> I made both the changes, it did not work and a full scan was issued as
> shown in the plan below.
>
> 00-00    Screen : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 5.0, cumulative cost = {120.5 rows, 165.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 227
> 00-01      Project(actor_id=[$0], first_name=[$1], last_name=[$2], last_update=[$3]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 5.0, cumulative cost = {120.0 rows, 165.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 226
> 00-02        SelectionVectorRemover : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 5.0, cumulative cost = {115.0 rows, 145.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 225
> 00-03          Limit(fetch=[5]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 5.0, cumulative cost = {110.0 rows, 140.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 224
> 00-04            Limit(fetch=[5]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 5.0, cumulative cost = {105.0 rows, 120.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 223
> 00-05              Jdbc(sql=[SELECT * FROM "public"."actor" ]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 164
>
> Regards,
> Rahul
>
> On Fri, Oct 19, 2018 at 8:47 PM Rahul Raj <ra...@option3.io> wrote:
>
>> I will make the changes and update you.
>>
>> Regards,
>> Rahul
>>
>> On Fri, Oct 19, 2018 at 1:05 AM Vitalii Diravka <vi...@apache.org>
>> wrote:
>>
>>> Rahul,
>>>
>>> Possibly *JdbcGroupScan* can be improved, for instance by overriding
>>> *supportsLimitPushdown()* and *applyLimit()* methods,
>>> *double rows *field can be updated by the limit value.
>>>
>>> I've performed the following query: select * from mysql.`testdb`.`table`
>>> limit 2;
>>> but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
>>>
>>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
>>> So it is definitely should be improved.
>>>
>>> *Note:* Changed mailing list to devs.
>>>
>>> On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:
>>>
>>> > Vitalii,
>>> >
>>> > Created documentation ticket DRILL-6794
>>> >
>>> > How do we proceed on extending the scan operators to support JDBC
>>> plugins?
>>> >
>>> > Regards,
>>> > Rahul
>>> >
>>> > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
>>> > wrote:
>>> >
>>> > > To update the documentation, since that issues were solved by using
>>> these
>>> > > properties in connection URL:
>>> > > defaultRowFetchSize=10000  [1]
>>> > > defaultAutoCommit=false    [2]
>>> > > The full URL was there "url": "jdbc:postgresql://
>>> > >
>>> > >
>>> >
>>> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
>>> > > "
>>> > >
>>> > > If some issues are still present, it is also reasonable to create
>>> tickets
>>> > > to track them.
>>> > >
>>> > > [1]
>>> > >
>>> > >
>>> >
>>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
>>> > > [2]
>>> > >
>>> > >
>>> >
>>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
>>> > >
>>> > > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io>
>>> wrote:
>>> > >
>>> > > > Should I create tickets to track these issues or should I create a
>>> > ticket
>>> > > > to update the documentation?
>>> > > >
>>> > > > Rahul
>>> > > >
>>> > > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <
>>> vitalii@apache.org>
>>> > > > wrote:
>>> > > >
>>> > > > > 1. You are right, it means it is reasonable to extend this rule
>>> for
>>> > > > > applying on other Scan operators (or possibly to create the
>>> separate
>>> > > > one).
>>> > > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
>>> > please
>>> > > > take
>>> > > > > a look [1].
>>> > > > >     Since you are trying to setup this configs, It will be good,
>>> if
>>> > you
>>> > > > > create a Jira ticket to add this info to Drill docs [2]
>>> > > > >
>>> > > > > [1]
>>> > > > >
>>> > >
>>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
>>> > > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
>>> > > > >
>>> > > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
>>> > > wrote:
>>> > > > >
>>> > > > > > Regarding the heap out of error, it could be that the jdbc
>>> driver
>>> > is
>>> > > > > > prefetching the entire record set to memory. I just had a look
>>> at
>>> > > > > > JdbcRecordReader, looks like by setting
>>> > connection#autoCommit(false)
>>> > > > and
>>> > > > > a
>>> > > > > > sufficient fetch size we could force the driver to stream data
>>> as
>>> > > > > required.
>>> > > > > > This is how postgres driver works.
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>>> > > > > >
>>> > > > > > We will have to see the behaviour of other drivers too.
>>> > > > > >
>>> > > > > > Let me know your thoughts here.
>>> > > > > >
>>> > > > > > Regards,
>>> > > > > > Rahul
>>> > > > > >
>>> > > > > >
>>> > > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <
>>> rahul.raj@option3.io>
>>> > > > wrote:
>>> > > > > >
>>> > > > > > > Hi Vitalii,
>>> > > > > > >
>>> > > > > > > There are two concrete implementations of the class -
>>> > > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
>>> > > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
>>> > > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies
>>> to
>>> > > > > Parquet.
>>> > > > > > > And pushdown only apply limit but not offset"
>>> > > > > > >
>>> > > > > > > Also I enabled debug mode and found LIMIT is not getting
>>> pushed
>>> > to
>>> > > > the
>>> > > > > > > query.
>>> > > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
>>> > {83.0
>>> > > > > rows,
>>> > > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
>>> > > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost =
>>> {72.0
>>> > > > rows,
>>> > > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
>>> > > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative
>>> cost =
>>> > > > {61.0
>>> > > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
>>> > > > > > >           JdbcPrel(sql=[SELECT * FROM
>>> > "u_g001"."executioniteration"
>>> > > > > WHERE
>>> > > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
>>> > rows,
>>> > > > > 50.0
>>> > > > > > cpu
>>> > > > > > >
>>> > > > > > > Regarding the second point, its the java heap getting filled
>>> with
>>> > > > jdbc
>>> > > > > > > results. How do we address this?
>>> > > > > > >
>>> > > > > > > Regards,
>>> > > > > > > Rahul
>>> > > > > > >
>>> > > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
>>> > > vitalii@apache.org>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > >> Hi Rahul,
>>> > > > > > >>
>>> > > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should
>>> do
>>> > > this
>>> > > > > > >> optimization, whether the GroupScan supports Limit Push
>>> Down.
>>> > > > > > >> Also you can verify in debug mode whether this rule is
>>> fired.
>>> > > > > > >> Possibly for some external DB (like MapR-DB) Drill should
>>> have
>>> > the
>>> > > > > > >> separate
>>> > > > > > >> class for this optimization [2].
>>> > > > > > >>
>>> > > > > > >> [1]
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
>>> > > > > > >> [2]
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
>>> > > > > > >>
>>> > > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <
>>> rahul.raj@option3.io
>>> > >
>>> > > > > wrote:
>>> > > > > > >>
>>> > > > > > >> > Hi,
>>> > > > > > >> >
>>> > > > > > >> > Drill does not push the LIMIT queries to external
>>> databases
>>> > and
>>> > > I
>>> > > > > > >> assume it
>>> > > > > > >> > could be more related to Calcite. This leads to out of
>>> memory
>>> > > > > > situations
>>> > > > > > >> > while querying large table to view few records.  Is there
>>> > > > something
>>> > > > > > that
>>> > > > > > >> > could be improved here? One solutions would be to push
>>> filters
>>> > > > down
>>> > > > > to
>>> > > > > > >> the
>>> > > > > > >> > DB and/or combined with some JDBC batch size limit to
>>> flush a
>>> > > part
>>> > > > > as
>>> > > > > > >> > parquet.
>>> > > > > > >> >
>>> > > > > > >> > Regards,
>>> > > > > > >> > Rahul
>>> > > > > > >> >
>>> > > > > > >> > --
>>> > > > > > >> > _*** This email and any files transmitted with it are
>>> > > confidential
>>> > > > > and
>>> > > > > > >> > intended solely for the use of the individual or entity to
>>> > whom
>>> > > it
>>> > > > > is
>>> > > > > > >> > addressed. If you are not the named addressee then you
>>> should
>>> > > not
>>> > > > > > >> > disseminate, distribute or copy this e-mail. Please
>>> notify the
>>> > > > > sender
>>> > > > > > >> > immediately and delete this e-mail from your system.***_
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > _*** This email and any files transmitted with it are
>>> confidential
>>> > > and
>>> > > > > > intended solely for the use of the individual or entity to
>>> whom it
>>> > is
>>> > > > > > addressed. If you are not the named addressee then you should
>>> not
>>> > > > > > disseminate, distribute or copy this e-mail. Please notify the
>>> > sender
>>> > > > > > immediately and delete this e-mail from your system.***_
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > > --
>>> > > > _*** This email and any files transmitted with it are confidential
>>> and
>>> > > > intended solely for the use of the individual or entity to whom it
>>> is
>>> > > > addressed. If you are not the named addressee then you should not
>>> > > > disseminate, distribute or copy this e-mail. Please notify the
>>> sender
>>> > > > immediately and delete this e-mail from your system.***_
>>> > > >
>>> > >
>>> >
>>> > --
>>> > _*** This email and any files transmitted with it are confidential and
>>> > intended solely for the use of the individual or entity to whom it is
>>> > addressed. If you are not the named addressee then you should not
>>> > disseminate, distribute or copy this e-mail. Please notify the sender
>>> > immediately and delete this e-mail from your system.***_
>>> >
>>>
>>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Vitalii,

I made both the changes, it did not work and a full scan was issued as
shown in the plan below.

00-00    Screen : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
= 5.0, cumulative cost = {120.5 rows, 165.5 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 227
00-01      Project(actor_id=[$0], first_name=[$1], last_name=[$2],
last_update=[$3]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
= 5.0, cumulative cost = {120.0 rows, 165.0 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 226
00-02        SelectionVectorRemover : rowType = RecordType(INTEGER
actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
last_update): rowcount = 5.0, cumulative cost = {115.0 rows, 145.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 225
00-03          Limit(fetch=[5]) : rowType = RecordType(INTEGER
actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
last_update): rowcount = 5.0, cumulative cost = {110.0 rows, 140.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 224
00-04            Limit(fetch=[5]) : rowType = RecordType(INTEGER
actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
last_update): rowcount = 5.0, cumulative cost = {105.0 rows, 120.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 223
00-05              Jdbc(sql=[SELECT * FROM "public"."actor" ]) :
rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name,
VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 100.0,
cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 164

Regards,
Rahul

On Fri, Oct 19, 2018 at 8:47 PM Rahul Raj <ra...@option3.io> wrote:

> I will make the changes and update you.
>
> Regards,
> Rahul
>
> On Fri, Oct 19, 2018 at 1:05 AM Vitalii Diravka <vi...@apache.org>
> wrote:
>
>> Rahul,
>>
>> Possibly *JdbcGroupScan* can be improved, for instance by overriding
>> *supportsLimitPushdown()* and *applyLimit()* methods,
>> *double rows *field can be updated by the limit value.
>>
>> I've performed the following query: select * from mysql.`testdb`.`table`
>> limit 2;
>> but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
>>
>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
>> So it is definitely should be improved.
>>
>> *Note:* Changed mailing list to devs.
>>
>> On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:
>>
>> > Vitalii,
>> >
>> > Created documentation ticket DRILL-6794
>> >
>> > How do we proceed on extending the scan operators to support JDBC
>> plugins?
>> >
>> > Regards,
>> > Rahul
>> >
>> > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
>> > wrote:
>> >
>> > > To update the documentation, since that issues were solved by using
>> these
>> > > properties in connection URL:
>> > > defaultRowFetchSize=10000  [1]
>> > > defaultAutoCommit=false    [2]
>> > > The full URL was there "url": "jdbc:postgresql://
>> > >
>> > >
>> >
>> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
>> > > "
>> > >
>> > > If some issues are still present, it is also reasonable to create
>> tickets
>> > > to track them.
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
>> > > [2]
>> > >
>> > >
>> >
>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
>> > >
>> > > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io>
>> wrote:
>> > >
>> > > > Should I create tickets to track these issues or should I create a
>> > ticket
>> > > > to update the documentation?
>> > > >
>> > > > Rahul
>> > > >
>> > > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vitalii@apache.org
>> >
>> > > > wrote:
>> > > >
>> > > > > 1. You are right, it means it is reasonable to extend this rule
>> for
>> > > > > applying on other Scan operators (or possibly to create the
>> separate
>> > > > one).
>> > > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
>> > please
>> > > > take
>> > > > > a look [1].
>> > > > >     Since you are trying to setup this configs, It will be good,
>> if
>> > you
>> > > > > create a Jira ticket to add this info to Drill docs [2]
>> > > > >
>> > > > > [1]
>> > > > >
>> > >
>> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
>> > > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
>> > > > >
>> > > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
>> > > wrote:
>> > > > >
>> > > > > > Regarding the heap out of error, it could be that the jdbc
>> driver
>> > is
>> > > > > > prefetching the entire record set to memory. I just had a look
>> at
>> > > > > > JdbcRecordReader, looks like by setting
>> > connection#autoCommit(false)
>> > > > and
>> > > > > a
>> > > > > > sufficient fetch size we could force the driver to stream data
>> as
>> > > > > required.
>> > > > > > This is how postgres driver works.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>> > > > > >
>> > > > > > We will have to see the behaviour of other drivers too.
>> > > > > >
>> > > > > > Let me know your thoughts here.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Rahul
>> > > > > >
>> > > > > >
>> > > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <rahul.raj@option3.io
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi Vitalii,
>> > > > > > >
>> > > > > > > There are two concrete implementations of the class -
>> > > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
>> > > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
>> > > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies
>> to
>> > > > > Parquet.
>> > > > > > > And pushdown only apply limit but not offset"
>> > > > > > >
>> > > > > > > Also I enabled debug mode and found LIMIT is not getting
>> pushed
>> > to
>> > > > the
>> > > > > > > query.
>> > > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
>> > {83.0
>> > > > > rows,
>> > > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
>> > > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost =
>> {72.0
>> > > > rows,
>> > > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
>> > > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative
>> cost =
>> > > > {61.0
>> > > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
>> > > > > > >           JdbcPrel(sql=[SELECT * FROM
>> > "u_g001"."executioniteration"
>> > > > > WHERE
>> > > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
>> > rows,
>> > > > > 50.0
>> > > > > > cpu
>> > > > > > >
>> > > > > > > Regarding the second point, its the java heap getting filled
>> with
>> > > > jdbc
>> > > > > > > results. How do we address this?
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Rahul
>> > > > > > >
>> > > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
>> > > vitalii@apache.org>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi Rahul,
>> > > > > > >>
>> > > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should
>> do
>> > > this
>> > > > > > >> optimization, whether the GroupScan supports Limit Push Down.
>> > > > > > >> Also you can verify in debug mode whether this rule is fired.
>> > > > > > >> Possibly for some external DB (like MapR-DB) Drill should
>> have
>> > the
>> > > > > > >> separate
>> > > > > > >> class for this optimization [2].
>> > > > > > >>
>> > > > > > >> [1]
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
>> > > > > > >> [2]
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
>> > > > > > >>
>> > > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <
>> rahul.raj@option3.io
>> > >
>> > > > > wrote:
>> > > > > > >>
>> > > > > > >> > Hi,
>> > > > > > >> >
>> > > > > > >> > Drill does not push the LIMIT queries to external databases
>> > and
>> > > I
>> > > > > > >> assume it
>> > > > > > >> > could be more related to Calcite. This leads to out of
>> memory
>> > > > > > situations
>> > > > > > >> > while querying large table to view few records.  Is there
>> > > > something
>> > > > > > that
>> > > > > > >> > could be improved here? One solutions would be to push
>> filters
>> > > > down
>> > > > > to
>> > > > > > >> the
>> > > > > > >> > DB and/or combined with some JDBC batch size limit to
>> flush a
>> > > part
>> > > > > as
>> > > > > > >> > parquet.
>> > > > > > >> >
>> > > > > > >> > Regards,
>> > > > > > >> > Rahul
>> > > > > > >> >
>> > > > > > >> > --
>> > > > > > >> > _*** This email and any files transmitted with it are
>> > > confidential
>> > > > > and
>> > > > > > >> > intended solely for the use of the individual or entity to
>> > whom
>> > > it
>> > > > > is
>> > > > > > >> > addressed. If you are not the named addressee then you
>> should
>> > > not
>> > > > > > >> > disseminate, distribute or copy this e-mail. Please notify
>> the
>> > > > > sender
>> > > > > > >> > immediately and delete this e-mail from your system.***_
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > _*** This email and any files transmitted with it are
>> confidential
>> > > and
>> > > > > > intended solely for the use of the individual or entity to whom
>> it
>> > is
>> > > > > > addressed. If you are not the named addressee then you should
>> not
>> > > > > > disseminate, distribute or copy this e-mail. Please notify the
>> > sender
>> > > > > > immediately and delete this e-mail from your system.***_
>> > > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > _*** This email and any files transmitted with it are confidential
>> and
>> > > > intended solely for the use of the individual or entity to whom it
>> is
>> > > > addressed. If you are not the named addressee then you should not
>> > > > disseminate, distribute or copy this e-mail. Please notify the
>> sender
>> > > > immediately and delete this e-mail from your system.***_
>> > > >
>> > >
>> >
>> > --
>> > _*** This email and any files transmitted with it are confidential and
>> > intended solely for the use of the individual or entity to whom it is
>> > addressed. If you are not the named addressee then you should not
>> > disseminate, distribute or copy this e-mail. Please notify the sender
>> > immediately and delete this e-mail from your system.***_
>> >
>>
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

I will make the changes and update you.

Regards,
Rahul

On Fri, Oct 19, 2018 at 1:05 AM Vitalii Diravka <vi...@apache.org> wrote:

> Rahul,
>
> Possibly *JdbcGroupScan* can be improved, for instance by overriding
> *supportsLimitPushdown()* and *applyLimit()* methods,
> *double rows *field can be updated by the limit value.
>
> I've performed the following query: select * from mysql.`testdb`.`table`
> limit 2;
> but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
>
> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
> So it is definitely should be improved.
>
> *Note:* Changed mailing list to devs.
>
> On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:
>
> > Vitalii,
> >
> > Created documentation ticket DRILL-6794
> >
> > How do we proceed on extending the scan operators to support JDBC
> plugins?
> >
> > Regards,
> > Rahul
> >
> > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
> > wrote:
> >
> > > To update the documentation, since that issues were solved by using
> these
> > > properties in connection URL:
> > > defaultRowFetchSize=10000  [1]
> > > defaultAutoCommit=false    [2]
> > > The full URL was there "url": "jdbc:postgresql://
> > >
> > >
> >
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> > > "
> > >
> > > If some issues are still present, it is also reasonable to create
> tickets
> > > to track them.
> > >
> > > [1]
> > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> > >
> > > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io>
> wrote:
> > >
> > > > Should I create tickets to track these issues or should I create a
> > ticket
> > > > to update the documentation?
> > > >
> > > > Rahul
> > > >
> > > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org>
> > > > wrote:
> > > >
> > > > > 1. You are right, it means it is reasonable to extend this rule for
> > > > > applying on other Scan operators (or possibly to create the
> separate
> > > > one).
> > > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> > please
> > > > take
> > > > > a look [1].
> > > > >     Since you are trying to setup this configs, It will be good, if
> > you
> > > > > create a Jira ticket to add this info to Drill docs [2]
> > > > >
> > > > > [1]
> > > > >
> > >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > > > >
> > > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
> > > wrote:
> > > > >
> > > > > > Regarding the heap out of error, it could be that the jdbc driver
> > is
> > > > > > prefetching the entire record set to memory. I just had a look at
> > > > > > JdbcRecordReader, looks like by setting
> > connection#autoCommit(false)
> > > > and
> > > > > a
> > > > > > sufficient fetch size we could force the driver to stream data as
> > > > > required.
> > > > > > This is how postgres driver works.
> > > > > >
> > > > >
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > > > >
> > > > > > We will have to see the behaviour of other drivers too.
> > > > > >
> > > > > > Let me know your thoughts here.
> > > > > >
> > > > > > Regards,
> > > > > > Rahul
> > > > > >
> > > > > >
> > > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io>
> > > > wrote:
> > > > > >
> > > > > > > Hi Vitalii,
> > > > > > >
> > > > > > > There are two concrete implementations of the class -
> > > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > > > Parquet.
> > > > > > > And pushdown only apply limit but not offset"
> > > > > > >
> > > > > > > Also I enabled debug mode and found LIMIT is not getting pushed
> > to
> > > > the
> > > > > > > query.
> > > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> > {83.0
> > > > > rows,
> > > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost =
> {72.0
> > > > rows,
> > > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative
> cost =
> > > > {61.0
> > > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > > > >           JdbcPrel(sql=[SELECT * FROM
> > "u_g001"."executioniteration"
> > > > > WHERE
> > > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
> > rows,
> > > > > 50.0
> > > > > > cpu
> > > > > > >
> > > > > > > Regarding the second point, its the java heap getting filled
> with
> > > > jdbc
> > > > > > > results. How do we address this?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Rahul
> > > > > > >
> > > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> > > vitalii@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Rahul,
> > > > > > >>
> > > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do
> > > this
> > > > > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > > > >> Possibly for some external DB (like MapR-DB) Drill should have
> > the
> > > > > > >> separate
> > > > > > >> class for this optimization [2].
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > > > >> [2]
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > > > >>
> > > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <
> rahul.raj@option3.io
> > >
> > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > Drill does not push the LIMIT queries to external databases
> > and
> > > I
> > > > > > >> assume it
> > > > > > >> > could be more related to Calcite. This leads to out of
> memory
> > > > > > situations
> > > > > > >> > while querying large table to view few records.  Is there
> > > > something
> > > > > > that
> > > > > > >> > could be improved here? One solutions would be to push
> filters
> > > > down
> > > > > to
> > > > > > >> the
> > > > > > >> > DB and/or combined with some JDBC batch size limit to flush
> a
> > > part
> > > > > as
> > > > > > >> > parquet.
> > > > > > >> >
> > > > > > >> > Regards,
> > > > > > >> > Rahul
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > _*** This email and any files transmitted with it are
> > > confidential
> > > > > and
> > > > > > >> > intended solely for the use of the individual or entity to
> > whom
> > > it
> > > > > is
> > > > > > >> > addressed. If you are not the named addressee then you
> should
> > > not
> > > > > > >> > disseminate, distribute or copy this e-mail. Please notify
> the
> > > > > sender
> > > > > > >> > immediately and delete this e-mail from your system.***_
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > _*** This email and any files transmitted with it are
> confidential
> > > and
> > > > > > intended solely for the use of the individual or entity to whom
> it
> > is
> > > > > > addressed. If you are not the named addressee then you should not
> > > > > > disseminate, distribute or copy this e-mail. Please notify the
> > sender
> > > > > > immediately and delete this e-mail from your system.***_
> > > > > >
> > > > >
> > > >
> > > > --
> > > > _*** This email and any files transmitted with it are confidential
> and
> > > > intended solely for the use of the individual or entity to whom it is
> > > > addressed. If you are not the named addressee then you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately and delete this e-mail from your system.***_
> > > >
> > >
> >
> > --
> > _*** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.***_
> >
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

Rahul,

Possibly *JdbcGroupScan* can be improved, for instance by overriding
*supportsLimitPushdown()* and *applyLimit()* methods,
*double rows *field can be updated by the limit value.

I've performed the following query: select * from mysql.`testdb`.`table`
limit 2;
but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
So it is definitely should be improved.

*Note:* Changed mailing list to devs.

On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:

> Vitalii,
>
> Created documentation ticket DRILL-6794
>
> How do we proceed on extending the scan operators to support JDBC plugins?
>
> Regards,
> Rahul
>
> On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
> wrote:
>
> > To update the documentation, since that issues were solved by using these
> > properties in connection URL:
> > defaultRowFetchSize=10000  [1]
> > defaultAutoCommit=false    [2]
> > The full URL was there "url": "jdbc:postgresql://
> >
> >
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> > "
> >
> > If some issues are still present, it is also reasonable to create tickets
> > to track them.
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> > [2]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> >
> > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io> wrote:
> >
> > > Should I create tickets to track these issues or should I create a
> ticket
> > > to update the documentation?
> > >
> > > Rahul
> > >
> > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org>
> > > wrote:
> > >
> > > > 1. You are right, it means it is reasonable to extend this rule for
> > > > applying on other Scan operators (or possibly to create the separate
> > > one).
> > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> please
> > > take
> > > > a look [1].
> > > >     Since you are trying to setup this configs, It will be good, if
> you
> > > > create a Jira ticket to add this info to Drill docs [2]
> > > >
> > > > [1]
> > > >
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > > >
> > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
> > wrote:
> > > >
> > > > > Regarding the heap out of error, it could be that the jdbc driver
> is
> > > > > prefetching the entire record set to memory. I just had a look at
> > > > > JdbcRecordReader, looks like by setting
> connection#autoCommit(false)
> > > and
> > > > a
> > > > > sufficient fetch size we could force the driver to stream data as
> > > > required.
> > > > > This is how postgres driver works.
> > > > >
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > > >
> > > > > We will have to see the behaviour of other drivers too.
> > > > >
> > > > > Let me know your thoughts here.
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > >
> > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io>
> > > wrote:
> > > > >
> > > > > > Hi Vitalii,
> > > > > >
> > > > > > There are two concrete implementations of the class -
> > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > > Parquet.
> > > > > > And pushdown only apply limit but not offset"
> > > > > >
> > > > > > Also I enabled debug mode and found LIMIT is not getting pushed
> to
> > > the
> > > > > > query.
> > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {83.0
> > > > rows,
> > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> > > rows,
> > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> > > {61.0
> > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > > >           JdbcPrel(sql=[SELECT * FROM
> "u_g001"."executioniteration"
> > > > WHERE
> > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
> rows,
> > > > 50.0
> > > > > cpu
> > > > > >
> > > > > > Regarding the second point, its the java heap getting filled with
> > > jdbc
> > > > > > results. How do we address this?
> > > > > >
> > > > > > Regards,
> > > > > > Rahul
> > > > > >
> > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> > vitalii@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Rahul,
> > > > > >>
> > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do
> > this
> > > > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > > >> Possibly for some external DB (like MapR-DB) Drill should have
> the
> > > > > >> separate
> > > > > >> class for this optimization [2].
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > > >> [2]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > > >>
> > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io
> >
> > > > wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Drill does not push the LIMIT queries to external databases
> and
> > I
> > > > > >> assume it
> > > > > >> > could be more related to Calcite. This leads to out of memory
> > > > > situations
> > > > > >> > while querying large table to view few records.  Is there
> > > something
> > > > > that
> > > > > >> > could be improved here? One solutions would be to push filters
> > > down
> > > > to
> > > > > >> the
> > > > > >> > DB and/or combined with some JDBC batch size limit to flush a
> > part
> > > > as
> > > > > >> > parquet.
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Rahul
> > > > > >> >
> > > > > >> > --
> > > > > >> > _*** This email and any files transmitted with it are
> > confidential
> > > > and
> > > > > >> > intended solely for the use of the individual or entity to
> whom
> > it
> > > > is
> > > > > >> > addressed. If you are not the named addressee then you should
> > not
> > > > > >> > disseminate, distribute or copy this e-mail. Please notify the
> > > > sender
> > > > > >> > immediately and delete this e-mail from your system.***_
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > _*** This email and any files transmitted with it are confidential
> > and
> > > > > intended solely for the use of the individual or entity to whom it
> is
> > > > > addressed. If you are not the named addressee then you should not
> > > > > disseminate, distribute or copy this e-mail. Please notify the
> sender
> > > > > immediately and delete this e-mail from your system.***_
> > > > >
> > > >
> > >
> > > --
> > > _*** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.***_
> > >
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

Rahul,

Possibly *JdbcGroupScan* can be improved, for instance by overriding
*supportsLimitPushdown()* and *applyLimit()* methods,
*double rows *field can be updated by the limit value.

I've performed the following query: select * from mysql.`testdb`.`table`
limit 2;
but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
So it is definitely should be improved.

*Note:* Changed mailing list to devs.

On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <ra...@option3.io> wrote:

> Vitalii,
>
> Created documentation ticket DRILL-6794
>
> How do we proceed on extending the scan operators to support JDBC plugins?
>
> Regards,
> Rahul
>
> On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org>
> wrote:
>
> > To update the documentation, since that issues were solved by using these
> > properties in connection URL:
> > defaultRowFetchSize=10000  [1]
> > defaultAutoCommit=false    [2]
> > The full URL was there "url": "jdbc:postgresql://
> >
> >
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> > "
> >
> > If some issues are still present, it is also reasonable to create tickets
> > to track them.
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> > [2]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> >
> > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io> wrote:
> >
> > > Should I create tickets to track these issues or should I create a
> ticket
> > > to update the documentation?
> > >
> > > Rahul
> > >
> > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org>
> > > wrote:
> > >
> > > > 1. You are right, it means it is reasonable to extend this rule for
> > > > applying on other Scan operators (or possibly to create the separate
> > > one).
> > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> please
> > > take
> > > > a look [1].
> > > >     Since you are trying to setup this configs, It will be good, if
> you
> > > > create a Jira ticket to add this info to Drill docs [2]
> > > >
> > > > [1]
> > > >
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > > >
> > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
> > wrote:
> > > >
> > > > > Regarding the heap out of error, it could be that the jdbc driver
> is
> > > > > prefetching the entire record set to memory. I just had a look at
> > > > > JdbcRecordReader, looks like by setting
> connection#autoCommit(false)
> > > and
> > > > a
> > > > > sufficient fetch size we could force the driver to stream data as
> > > > required.
> > > > > This is how postgres driver works.
> > > > >
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > > >
> > > > > We will have to see the behaviour of other drivers too.
> > > > >
> > > > > Let me know your thoughts here.
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > >
> > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io>
> > > wrote:
> > > > >
> > > > > > Hi Vitalii,
> > > > > >
> > > > > > There are two concrete implementations of the class -
> > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > > Parquet.
> > > > > > And pushdown only apply limit but not offset"
> > > > > >
> > > > > > Also I enabled debug mode and found LIMIT is not getting pushed
> to
> > > the
> > > > > > query.
> > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {83.0
> > > > rows,
> > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> > > rows,
> > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> > > {61.0
> > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > > >           JdbcPrel(sql=[SELECT * FROM
> "u_g001"."executioniteration"
> > > > WHERE
> > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
> rows,
> > > > 50.0
> > > > > cpu
> > > > > >
> > > > > > Regarding the second point, its the java heap getting filled with
> > > jdbc
> > > > > > results. How do we address this?
> > > > > >
> > > > > > Regards,
> > > > > > Rahul
> > > > > >
> > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> > vitalii@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Rahul,
> > > > > >>
> > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do
> > this
> > > > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > > >> Possibly for some external DB (like MapR-DB) Drill should have
> the
> > > > > >> separate
> > > > > >> class for this optimization [2].
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > > >> [2]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > > >>
> > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io
> >
> > > > wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Drill does not push the LIMIT queries to external databases
> and
> > I
> > > > > >> assume it
> > > > > >> > could be more related to Calcite. This leads to out of memory
> > > > > situations
> > > > > >> > while querying large table to view few records.  Is there
> > > something
> > > > > that
> > > > > >> > could be improved here? One solutions would be to push filters
> > > down
> > > > to
> > > > > >> the
> > > > > >> > DB and/or combined with some JDBC batch size limit to flush a
> > part
> > > > as
> > > > > >> > parquet.
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Rahul
> > > > > >> >
> > > > > >> > --
> > > > > >> > _*** This email and any files transmitted with it are
> > confidential
> > > > and
> > > > > >> > intended solely for the use of the individual or entity to
> whom
> > it
> > > > is
> > > > > >> > addressed. If you are not the named addressee then you should
> > not
> > > > > >> > disseminate, distribute or copy this e-mail. Please notify the
> > > > sender
> > > > > >> > immediately and delete this e-mail from your system.***_
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > _*** This email and any files transmitted with it are confidential
> > and
> > > > > intended solely for the use of the individual or entity to whom it
> is
> > > > > addressed. If you are not the named addressee then you should not
> > > > > disseminate, distribute or copy this e-mail. Please notify the
> sender
> > > > > immediately and delete this e-mail from your system.***_
> > > > >
> > > >
> > >
> > > --
> > > _*** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.***_
> > >
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Vitalii,

Created documentation ticket DRILL-6794

How do we proceed on extending the scan operators to support JDBC plugins?

Regards,
Rahul

On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vi...@apache.org> wrote:

> To update the documentation, since that issues were solved by using these
> properties in connection URL:
> defaultRowFetchSize=10000  [1]
> defaultAutoCommit=false    [2]
> The full URL was there "url": "jdbc:postgresql://
>
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> "
>
> If some issues are still present, it is also reasonable to create tickets
> to track them.
>
> [1]
>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> [2]
>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
>
> On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io> wrote:
>
> > Should I create tickets to track these issues or should I create a ticket
> > to update the documentation?
> >
> > Rahul
> >
> > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org>
> > wrote:
> >
> > > 1. You are right, it means it is reasonable to extend this rule for
> > > applying on other Scan operators (or possibly to create the separate
> > one).
> > > 2. There was a question about OOM issues in Drill + PostgreSQL, please
> > take
> > > a look [1].
> > >     Since you are trying to setup this configs, It will be good, if you
> > > create a Jira ticket to add this info to Drill docs [2]
> > >
> > > [1]
> > >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > >
> > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io>
> wrote:
> > >
> > > > Regarding the heap out of error, it could be that the jdbc driver is
> > > > prefetching the entire record set to memory. I just had a look at
> > > > JdbcRecordReader, looks like by setting connection#autoCommit(false)
> > and
> > > a
> > > > sufficient fetch size we could force the driver to stream data as
> > > required.
> > > > This is how postgres driver works.
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > >
> > > > We will have to see the behaviour of other drivers too.
> > > >
> > > > Let me know your thoughts here.
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > > >
> > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io>
> > wrote:
> > > >
> > > > > Hi Vitalii,
> > > > >
> > > > > There are two concrete implementations of the class -
> > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > Parquet.
> > > > > And pushdown only apply limit but not offset"
> > > > >
> > > > > Also I enabled debug mode and found LIMIT is not getting pushed to
> > the
> > > > > query.
> > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> > > rows,
> > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> > rows,
> > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> > {61.0
> > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> > > WHERE
> > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows,
> > > 50.0
> > > > cpu
> > > > >
> > > > > Regarding the second point, its the java heap getting filled with
> > jdbc
> > > > > results. How do we address this?
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> vitalii@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hi Rahul,
> > > > >>
> > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do
> this
> > > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > >> Possibly for some external DB (like MapR-DB) Drill should have the
> > > > >> separate
> > > > >> class for this optimization [2].
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > >> [2]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > >>
> > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io>
> > > wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > Drill does not push the LIMIT queries to external databases and
> I
> > > > >> assume it
> > > > >> > could be more related to Calcite. This leads to out of memory
> > > > situations
> > > > >> > while querying large table to view few records.  Is there
> > something
> > > > that
> > > > >> > could be improved here? One solutions would be to push filters
> > down
> > > to
> > > > >> the
> > > > >> > DB and/or combined with some JDBC batch size limit to flush a
> part
> > > as
> > > > >> > parquet.
> > > > >> >
> > > > >> > Regards,
> > > > >> > Rahul
> > > > >> >
> > > > >> > --
> > > > >> > _*** This email and any files transmitted with it are
> confidential
> > > and
> > > > >> > intended solely for the use of the individual or entity to whom
> it
> > > is
> > > > >> > addressed. If you are not the named addressee then you should
> not
> > > > >> > disseminate, distribute or copy this e-mail. Please notify the
> > > sender
> > > > >> > immediately and delete this e-mail from your system.***_
> > > > >> >
> > > > >>
> > > > >
> > > >
> > > > --
> > > > _*** This email and any files transmitted with it are confidential
> and
> > > > intended solely for the use of the individual or entity to whom it is
> > > > addressed. If you are not the named addressee then you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately and delete this e-mail from your system.***_
> > > >
> > >
> >
> > --
> > _*** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.***_
> >
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

To update the documentation, since that issues were solved by using these
properties in connection URL:
defaultRowFetchSize=10000  [1]
defaultAutoCommit=false    [2]
The full URL was there "url": "jdbc:postgresql://
myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
"

If some issues are still present, it is also reasonable to create tickets
to track them.

[1]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
[2]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E

On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <ra...@option3.io> wrote:

> Should I create tickets to track these issues or should I create a ticket
> to update the documentation?
>
> Rahul
>
> On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org>
> wrote:
>
> > 1. You are right, it means it is reasonable to extend this rule for
> > applying on other Scan operators (or possibly to create the separate
> one).
> > 2. There was a question about OOM issues in Drill + PostgreSQL, please
> take
> > a look [1].
> >     Since you are trying to setup this configs, It will be good, if you
> > create a Jira ticket to add this info to Drill docs [2]
> >
> > [1]
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> >
> > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io> wrote:
> >
> > > Regarding the heap out of error, it could be that the jdbc driver is
> > > prefetching the entire record set to memory. I just had a look at
> > > JdbcRecordReader, looks like by setting connection#autoCommit(false)
> and
> > a
> > > sufficient fetch size we could force the driver to stream data as
> > required.
> > > This is how postgres driver works.
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > >
> > > We will have to see the behaviour of other drivers too.
> > >
> > > Let me know your thoughts here.
> > >
> > > Regards,
> > > Rahul
> > >
> > >
> > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io>
> wrote:
> > >
> > > > Hi Vitalii,
> > > >
> > > > There are two concrete implementations of the class -
> > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > Parquet.
> > > > And pushdown only apply limit but not offset"
> > > >
> > > > Also I enabled debug mode and found LIMIT is not getting pushed to
> the
> > > > query.
> > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> > rows,
> > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> rows,
> > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {61.0
> > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> > WHERE
> > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows,
> > 50.0
> > > cpu
> > > >
> > > > Regarding the second point, its the java heap getting filled with
> jdbc
> > > > results. How do we address this?
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vi...@apache.org>
> > > > wrote:
> > > >
> > > >> Hi Rahul,
> > > >>
> > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > >> Also you can verify in debug mode whether this rule is fired.
> > > >> Possibly for some external DB (like MapR-DB) Drill should have the
> > > >> separate
> > > >> class for this optimization [2].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > >>
> > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io>
> > wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > Drill does not push the LIMIT queries to external databases and I
> > > >> assume it
> > > >> > could be more related to Calcite. This leads to out of memory
> > > situations
> > > >> > while querying large table to view few records.  Is there
> something
> > > that
> > > >> > could be improved here? One solutions would be to push filters
> down
> > to
> > > >> the
> > > >> > DB and/or combined with some JDBC batch size limit to flush a part
> > as
> > > >> > parquet.
> > > >> >
> > > >> > Regards,
> > > >> > Rahul
> > > >> >
> > > >> > --
> > > >> > _*** This email and any files transmitted with it are confidential
> > and
> > > >> > intended solely for the use of the individual or entity to whom it
> > is
> > > >> > addressed. If you are not the named addressee then you should not
> > > >> > disseminate, distribute or copy this e-mail. Please notify the
> > sender
> > > >> > immediately and delete this e-mail from your system.***_
> > > >> >
> > > >>
> > > >
> > >
> > > --
> > > _*** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.***_
> > >
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Should I create tickets to track these issues or should I create a ticket
to update the documentation?

Rahul

On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vi...@apache.org> wrote:

> 1. You are right, it means it is reasonable to extend this rule for
> applying on other Scan operators (or possibly to create the separate one).
> 2. There was a question about OOM issues in Drill + PostgreSQL, please take
> a look [1].
>     Since you are trying to setup this configs, It will be good, if you
> create a Jira ticket to add this info to Drill docs [2]
>
> [1]
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> [2] https://drill.apache.org/docs/rdbms-storage-plugin/
>
> On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io> wrote:
>
> > Regarding the heap out of error, it could be that the jdbc driver is
> > prefetching the entire record set to memory. I just had a look at
> > JdbcRecordReader, looks like by setting connection#autoCommit(false) and
> a
> > sufficient fetch size we could force the driver to stream data as
> required.
> > This is how postgres driver works.
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> >
> > We will have to see the behaviour of other drivers too.
> >
> > Let me know your thoughts here.
> >
> > Regards,
> > Rahul
> >
> >
> > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io> wrote:
> >
> > > Hi Vitalii,
> > >
> > > There are two concrete implementations of the class -
> > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> Parquet.
> > > And pushdown only apply limit but not offset"
> > >
> > > Also I enabled debug mode and found LIMIT is not getting pushed to the
> > > query.
> > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> rows,
> > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> WHERE
> > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows,
> 50.0
> > cpu
> > >
> > > Regarding the second point, its the java heap getting filled with jdbc
> > > results. How do we address this?
> > >
> > > Regards,
> > > Rahul
> > >
> > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vi...@apache.org>
> > > wrote:
> > >
> > >> Hi Rahul,
> > >>
> > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> > >> optimization, whether the GroupScan supports Limit Push Down.
> > >> Also you can verify in debug mode whether this rule is fired.
> > >> Possibly for some external DB (like MapR-DB) Drill should have the
> > >> separate
> > >> class for this optimization [2].
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > >> [2]
> > >>
> > >>
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > >>
> > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io>
> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > Drill does not push the LIMIT queries to external databases and I
> > >> assume it
> > >> > could be more related to Calcite. This leads to out of memory
> > situations
> > >> > while querying large table to view few records.  Is there something
> > that
> > >> > could be improved here? One solutions would be to push filters down
> to
> > >> the
> > >> > DB and/or combined with some JDBC batch size limit to flush a part
> as
> > >> > parquet.
> > >> >
> > >> > Regards,
> > >> > Rahul
> > >> >
> > >> > --
> > >> > _*** This email and any files transmitted with it are confidential
> and
> > >> > intended solely for the use of the individual or entity to whom it
> is
> > >> > addressed. If you are not the named addressee then you should not
> > >> > disseminate, distribute or copy this e-mail. Please notify the
> sender
> > >> > immediately and delete this e-mail from your system.***_
> > >> >
> > >>
> > >
> >
> > --
> > _*** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.***_
> >
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

1. You are right, it means it is reasonable to extend this rule for
applying on other Scan operators (or possibly to create the separate one).
2. There was a question about OOM issues in Drill + PostgreSQL, please take
a look [1].
    Since you are trying to setup this configs, It will be good, if you
create a Jira ticket to add this info to Drill docs [2]

[1] https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
[2] https://drill.apache.org/docs/rdbms-storage-plugin/

On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <ra...@option3.io> wrote:

> Regarding the heap out of error, it could be that the jdbc driver is
> prefetching the entire record set to memory. I just had a look at
> JdbcRecordReader, looks like by setting connection#autoCommit(false) and a
> sufficient fetch size we could force the driver to stream data as required.
> This is how postgres driver works.
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>
> We will have to see the behaviour of other drivers too.
>
> Let me know your thoughts here.
>
> Regards,
> Rahul
>
>
> On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io> wrote:
>
> > Hi Vitalii,
> >
> > There are two concrete implementations of the class -
> > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
> > And pushdown only apply limit but not offset"
> >
> > Also I enabled debug mode and found LIMIT is not getting pushed to the
> > query.
> >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
> > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
> > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0
> cpu
> >
> > Regarding the second point, its the java heap getting filled with jdbc
> > results. How do we address this?
> >
> > Regards,
> > Rahul
> >
> > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vi...@apache.org>
> > wrote:
> >
> >> Hi Rahul,
> >>
> >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> >> optimization, whether the GroupScan supports Limit Push Down.
> >> Also you can verify in debug mode whether this rule is fired.
> >> Possibly for some external DB (like MapR-DB) Drill should have the
> >> separate
> >> class for this optimization [2].
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> >> [2]
> >>
> >>
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> >>
> >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io> wrote:
> >>
> >> > Hi,
> >> >
> >> > Drill does not push the LIMIT queries to external databases and I
> >> assume it
> >> > could be more related to Calcite. This leads to out of memory
> situations
> >> > while querying large table to view few records.  Is there something
> that
> >> > could be improved here? One solutions would be to push filters down to
> >> the
> >> > DB and/or combined with some JDBC batch size limit to flush a part as
> >> > parquet.
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > --
> >> > _*** This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed. If you are not the named addressee then you should not
> >> > disseminate, distribute or copy this e-mail. Please notify the sender
> >> > immediately and delete this e-mail from your system.***_
> >> >
> >>
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Regarding the heap out of error, it could be that the jdbc driver is
prefetching the entire record set to memory. I just had a look at
JdbcRecordReader, looks like by setting connection#autoCommit(false) and a
sufficient fetch size we could force the driver to stream data as required.
This is how postgres driver works.
https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

We will have to see the behaviour of other drivers too.

Let me know your thoughts here.

Regards,
Rahul


On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <ra...@option3.io> wrote:

> Hi Vitalii,
>
> There are two concrete implementations of the class -
> DrillPushLimitToScanRule LIMIT_ON_SCAN and
> DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
> And pushdown only apply limit but not offset"
>
> Also I enabled debug mode and found LIMIT is not getting pushed to the
> query.
>     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
> 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
>       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
>         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
>           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
> "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0 cpu
>
> Regarding the second point, its the java heap getting filled with jdbc
> results. How do we address this?
>
> Regards,
> Rahul
>
> On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vi...@apache.org>
> wrote:
>
>> Hi Rahul,
>>
>> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
>> optimization, whether the GroupScan supports Limit Push Down.
>> Also you can verify in debug mode whether this rule is fired.
>> Possibly for some external DB (like MapR-DB) Drill should have the
>> separate
>> class for this optimization [2].
>>
>> [1]
>>
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
>> [2]
>>
>> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
>>
>> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io> wrote:
>>
>> > Hi,
>> >
>> > Drill does not push the LIMIT queries to external databases and I
>> assume it
>> > could be more related to Calcite. This leads to out of memory situations
>> > while querying large table to view few records.  Is there something that
>> > could be improved here? One solutions would be to push filters down to
>> the
>> > DB and/or combined with some JDBC batch size limit to flush a part as
>> > parquet.
>> >
>> > Regards,
>> > Rahul
>> >
>> > --
>> > _*** This email and any files transmitted with it are confidential and
>> > intended solely for the use of the individual or entity to whom it is
>> > addressed. If you are not the named addressee then you should not
>> > disseminate, distribute or copy this e-mail. Please notify the sender
>> > immediately and delete this e-mail from your system.***_
>> >
>>
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Rahul Raj <ra...@option3.io>.

Hi Vitalii,

There are two concrete implementations of the class -
DrillPushLimitToScanRule LIMIT_ON_SCAN and
DrillPushLimitToScanRule LIMIT_ON_PROJECT.
LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
And pushdown only apply limit but not offset"

Also I enabled debug mode and found LIMIT is not getting pushed to the
query.
    LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
      UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
        LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
          JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
"id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0 cpu

Regarding the second point, its the java heap getting filled with jdbc
results. How do we address this?

Regards,
Rahul

On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vi...@apache.org> wrote:

> Hi Rahul,
>
> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> optimization, whether the GroupScan supports Limit Push Down.
> Also you can verify in debug mode whether this rule is fired.
> Possibly for some external DB (like MapR-DB) Drill should have the separate
> class for this optimization [2].
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> [2]
>
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
>
> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io> wrote:
>
> > Hi,
> >
> > Drill does not push the LIMIT queries to external databases and I assume
> it
> > could be more related to Calcite. This leads to out of memory situations
> > while querying large table to view few records.  Is there something that
> > could be improved here? One solutions would be to push filters down to
> the
> > DB and/or combined with some JDBC batch size limit to flush a part as
> > parquet.
> >
> > Regards,
> > Rahul
> >
> > --
> > _*** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.***_
> >
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Posted by Vitalii Diravka <vi...@apache.org>.

Hi Rahul,

Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
optimization, whether the GroupScan supports Limit Push Down.
Also you can verify in debug mode whether this rule is fired.
Possibly for some external DB (like MapR-DB) Drill should have the separate
class for this optimization [2].

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
[2]
https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37

On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <ra...@option3.io> wrote:

> Hi,
>
> Drill does not push the LIMIT queries to external databases and I assume it
> could be more related to Calcite. This leads to out of memory situations
> while querying large table to view few records.  Is there something that
> could be improved here? One solutions would be to push filters down to the
> DB and/or combined with some JDBC batch size limit to flush a part as
> parquet.
>
> Regards,
> Rahul
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>