You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Sudip Mukherjee <sm...@commvault.com> on 2016/02/19 19:16:23 UTC

Optimizing SUM(1) query

Hi,

Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.

Thanks,
Sudip



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

Re: Optimizing SUM(1) query

Posted by Julian Hyde <jh...@apache.org>.

PS I did recall correctly:
https://issues.apache.org/jira/browse/HIVE-6192. But it's not
implemented using Calcite, sadly.

On Fri, Feb 19, 2016 at 12:11 PM, Julian Hyde <jh...@apache.org> wrote:
> And indeed COUNT(*) is equivalent to COUNT(1). COUNT(*) is the same as
> COUNT(e) where e is any not-null value.
>
> I would argue that SUM(1) should be optimized to COUNT(*). Or,
> generalizing a bit, that SUM(c) should be optimized to COUNT(*) * c.
>
> IIRC, Hive performs that optimization. It's a bit tricky, because in
> Calcite the expression will be in a Project below (hopefully directly
> below) the Aggregate, an the Aggregate just sees a column. But using
> RelMdPredicates you can see that the column is always equal to 1.
>
> Julian
>
>
> On Fri, Feb 19, 2016 at 11:44 AM, Aman Sinha <am...@apache.org> wrote:
>> For #records, why would tableau generate sum(1) instead of count(1) ?
>> Drill does not have specific optimization for sum(1).   It does have
>> optimization for count for Parquet data.
>>
>> Aman
>>
>> On Fri, Feb 19, 2016 at 10:16 AM, Sudip Mukherjee <sm...@commvault.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
>>> Getting these query while using Tableau. Mostly probably it is trying to
>>> figure out NUMBER_OF_RECORDS.
>>>
>>> Thanks,
>>> Sudip
>>>
>>>
>>>
>>> ***************************Legal Disclaimer***************************
>>> "This communication may contain confidential and privileged material for
>>> the
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>> by others is strictly prohibited. If you have received the message by
>>> mistake,
>>> please advise the sender by reply email and delete the message. Thank you."
>>> **********************************************************************

Re: Optimizing SUM(1) query

Posted by Julian Hyde <jh...@apache.org>.

And indeed COUNT(*) is equivalent to COUNT(1). COUNT(*) is the same as
COUNT(e) where e is any not-null value.

I would argue that SUM(1) should be optimized to COUNT(*). Or,
generalizing a bit, that SUM(c) should be optimized to COUNT(*) * c.

IIRC, Hive performs that optimization. It's a bit tricky, because in
Calcite the expression will be in a Project below (hopefully directly
below) the Aggregate, an the Aggregate just sees a column. But using
RelMdPredicates you can see that the column is always equal to 1.

Julian

On Fri, Feb 19, 2016 at 11:44 AM, Aman Sinha <am...@apache.org> wrote:
> For #records, why would tableau generate sum(1) instead of count(1) ?
> Drill does not have specific optimization for sum(1).   It does have
> optimization for count for Parquet data.
>
> Aman
>
> On Fri, Feb 19, 2016 at 10:16 AM, Sudip Mukherjee <sm...@commvault.com>
> wrote:
>
>> Hi,
>>
>> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
>> Getting these query while using Tableau. Mostly probably it is trying to
>> figure out NUMBER_OF_RECORDS.
>>
>> Thanks,
>> Sudip
>>
>>
>>
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material for
>> the
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>> please advise the sender by reply email and delete the message. Thank you."
>> **********************************************************************

Re: Optimizing SUM(1) query

Posted by Aman Sinha <am...@apache.org>.

For #records, why would tableau generate sum(1) instead of count(1) ?
Drill does not have specific optimization for sum(1).   It does have
optimization for count for Parquet data.

Aman

On Fri, Feb 19, 2016 at 10:16 AM, Sudip Mukherjee <sm...@commvault.com>
wrote:

> Hi,
>
> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
> Getting these query while using Tableau. Mostly probably it is trying to
> figure out NUMBER_OF_RECORDS.
>
> Thanks,
> Sudip
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************

RE: Optimizing SUM(1) query

Posted by Sudip Mukherjee <sm...@commvault.com>.

Hi Jacques,

Thanks for your comments. I've created a JIRA for improvement in future release
https://issues.apache.org/jira/browse/DRILL-4516

Thanks,
Sudip

-----Original Message-----
From: Jacques Nadeau [mailto:jacques@dremio.com] 
Sent: 17 March 2016 AM 02:33
To: dev
Subject: Re: Optimizing SUM(1) query

I don't think Julian is saying it does this, I think he is saying it should. I agree. (This actually is very common Tableau query pattern among other things.)

Sudip, do you want to open an enhancement JIRA where we rewrite SUM(1) to COUNT(1). Then our existing count optimizations can take over.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Mar 16, 2016 at 8:38 AM, Sudip Mukherjee <sm...@commvault.com>
wrote:

> I don't see DRILL is transforming the query. Tried with a CSV file.
> Please let me know if I am missing something.
>
> 00-00    Screen : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0,
> cumulative cost = {3.1 rows, 17.1 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 260
> 00-01      Project(EXPR$0=[$0]) : rowType = RecordType(INTEGER EXPR$0):
> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 
> network,
> 0.0 memory}, id = 259
> 00-02        StreamAgg(group=[{}], EXPR$0=[SUM($0)]) : rowType =
> RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 
> rows,
> 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 258
> 00-03          Project($f0=[1]) : rowType = RecordType(INTEGER $f0):
> rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 
> network,
> 0.0 memory}, id = 257
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/C:/data/company.csv, numFiles=1, columns=[`*`],
> files=[file:/C:/data/company.csv]]]) : rowType = RecordType(): 
> rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 256
>
> Thanks,
> Sudip
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: 16 March 2016 AM 12:50
> To: dev@drill.apache.org
> Subject: Re: Optimizing SUM(1) query
>
> Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at 
> an early stage (i.e. using a logical optimization rule) so that this 
> optimization does not need to be done for each engine?
>
> > On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee 
> > <sm...@commvault.com>
> wrote:
> >
> > I was trying to have an Optimizer rule for the solr storage plugin 
> > that
> I'm working on for this query. Trying to use SOLR field stats for this 
> , so that the query is faster..
> > Getting the below exception while transforming project to scan. 
> > Could
> you please advise?
> >
> >
> > 2016-03-15 08:20:35,149 
> > [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> > DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1 
> > cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
> >  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = 
> > {60.0
> rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
> >    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0,
> cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 
> memory}, id = 46
> >      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = 
> > {40.0
> rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
> >        DrillScanRel(table=[[solr, ANalert_494]], 
> > groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec 
> > [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/
> > filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]):
> > rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0 
> > network, 0.0 memory}, id = 26
> >
> > 2016-03-15 08:20:35,201 
> > [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> > DEBUG o.a.drill.exec.work.foreman.Foreman -
> > 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING
> > --> FAILED
> > org.apache.drill.exec.work.foreman.ForemanException: Unexpected
> exception during fragment initialization: index (0) must be less than 
> size
> (0)
> >       at
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> [drill-java-exec.jar:1.4.0]
> >       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.8.0_65]
> >       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.8.0_65]
> >       at java.lang.Thread.run(Unknown Source) [na:1.8.0_65] Caused by:
> > java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.j
> ava:305)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.j
> ava:284)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.ja
> va:80)
> ~[com.google.guava-guava.jar:na]
> >       at org.apache.calcite.util.Pair$6.get(Pair.java:335)
> ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> >       at
> org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInPro
> ject(StarColumnHelper.java:60)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itProject(StarColumnConverter.java:138)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itProject(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.
> java:69)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itScreen(StarColumnConverter.java:77)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itScreen(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.ja
> va:64)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.ins
> ertRenameProject(StarColumnConverter.java:72)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertTo
> Prel(DefaultSqlHandler.java:326)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(D
> efaultSqlHandler.java:175)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWor
> > ke
> > r.java:197) ~[drill-java-exec.jar:1.4.0]
> >
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Sudip Mukherjee [mailto:smukherjee@commvault.com]
> > Sent: 07 March 2016 PM 04:45
> > To: dev@drill.apache.org
> > Subject: RE: Optimizing SUM(1) query
> >
> > Hi,
> >
> > Extremely sorry for the delayed response!
> >
> > Basically , I am experimenting an Apache Solr plugin for Drill and 
> > tried to use Tableau. On intial selection of a "Table" , I see these 
> > queries from drill webui profile
> >
> > SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> >
> > SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` 
> > `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0
> >
> > I have a dataset of 2,297,451,8 and when I try access that,the 
> > tableau
> screen in kind of stuck in "Processing Request" for 5 minutes.
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> > Sent: 20 February 2016 AM 03:42
> > To: dev@drill.apache.org
> > Subject: Re: Optimizing SUM(1) query
> >
> > What are you trying to do in Tableau that generates these queries?
> >
> > Do you have the actual full query that is being generated?
> >
> > Tableau has a number of customization features, wonder if it is 
> > possible
> to optimize those settings to avoid this issue.
> >
> > --Andries
> >
> >
> >> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee 
> >> <sm...@commvault.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Have anyone tried optimizing SUM(1) query in drill? Or is it
> implemented?
> >> Getting these query while using Tableau. Mostly probably it is 
> >> trying
> to figure out NUMBER_OF_RECORDS.
> >>
> >> Thanks,
> >> Sudip
> >>
> >>
> >>
> >> ***************************Legal
> >> Disclaimer***************************
> >> "This communication may contain confidential and privileged 
> >> material for the sole use of the intended recipient. Any 
> >> unauthorized review, use or distribution by others is strictly 
> >> prohibited. If you have received the message by mistake, please 
> >> advise the sender by reply
> email and delete the message. Thank you."
> >> *******************************************************************
> >> **
> >> *
> >
> >
> >
> >
> >
> > ***************************Legal 
> > Disclaimer***************************
> > "This communication may contain confidential and privileged material 
> > for
> the sole use of the intended recipient. Any unauthorized review, use 
> or distribution by others is strictly prohibited. If you have received 
> the message by mistake, please advise the sender by reply email and 
> delete the message. Thank you."
> > ********************************************************************
> > **
> >
> >
> >
> >
> > ***************************Legal 
> > Disclaimer***************************
> > "This communication may contain confidential and privileged material 
> > for the sole use of the intended recipient. Any unauthorized review, 
> > use or distribution by others is strictly prohibited. If you have 
> > received the message by mistake, please advise the sender by reply 
> > email
> and delete the message. Thank you."
> > ********************************************************************
> > **
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply 
> email and delete the message. Thank you."
> **********************************************************************
>



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

Re: Optimizing SUM(1) query

Posted by Jacques Nadeau <ja...@dremio.com>.

I don't think Julian is saying it does this, I think he is saying it
should. I agree. (This actually is very common Tableau query pattern among
other things.)

Sudip, do you want to open an enhancement JIRA where we rewrite SUM(1) to
COUNT(1). Then our existing count optimizations can take over.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Mar 16, 2016 at 8:38 AM, Sudip Mukherjee <sm...@commvault.com>
wrote:

> I don't see DRILL is transforming the query. Tried with a CSV file.
> Please let me know if I am missing something.
>
> 00-00    Screen : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0,
> cumulative cost = {3.1 rows, 17.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
> = 260
> 00-01      Project(EXPR$0=[$0]) : rowType = RecordType(INTEGER EXPR$0):
> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 259
> 00-02        StreamAgg(group=[{}], EXPR$0=[SUM($0)]) : rowType =
> RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 rows,
> 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 258
> 00-03          Project($f0=[1]) : rowType = RecordType(INTEGER $f0):
> rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 257
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/C:/data/company.csv, numFiles=1, columns=[`*`],
> files=[file:/C:/data/company.csv]]]) : rowType = RecordType(): rowcount =
> 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 256
>
> Thanks,
> Sudip
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: 16 March 2016 AM 12:50
> To: dev@drill.apache.org
> Subject: Re: Optimizing SUM(1) query
>
> Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at an
> early stage (i.e. using a logical optimization rule) so that this
> optimization does not need to be done for each engine?
>
> > On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee <sm...@commvault.com>
> wrote:
> >
> > I was trying to have an Optimizer rule for the solr storage plugin that
> I'm working on for this query. Trying to use SOLR field stats for this , so
> that the query is faster..
> > Getting the below exception while transforming project to scan. Could
> you please advise?
> >
> >
> > 2016-03-15 08:20:35,149 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> > DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1
> > cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
> >  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = {60.0
> rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
> >    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0,
> cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0
> memory}, id = 46
> >      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = {40.0
> rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
> >        DrillScanRel(table=[[solr, ANalert_494]],
> > groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec
> > [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/
> > filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]):
> > rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0
> > network, 0.0 memory}, id = 26
> >
> > 2016-03-15 08:20:35,201 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> > DEBUG o.a.drill.exec.work.foreman.Foreman -
> > 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING
> > --> FAILED
> > org.apache.drill.exec.work.foreman.ForemanException: Unexpected
> exception during fragment initialization: index (0) must be less than size
> (0)
> >       at
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> [drill-java-exec.jar:1.4.0]
> >       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.8.0_65]
> >       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.8.0_65]
> >       at java.lang.Thread.run(Unknown Source) [na:1.8.0_65] Caused by:
> > java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80)
> ~[com.google.guava-guava.jar:na]
> >       at org.apache.calcite.util.Pair$6.get(Pair.java:335)
> ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> >       at
> org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInProject(StarColumnHelper.java:60)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:138)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.java:69)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:77)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.java:64)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.insertRenameProject(StarColumnConverter.java:72)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:326)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:175)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorke
> > r.java:197) ~[drill-java-exec.jar:1.4.0]
> >
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Sudip Mukherjee [mailto:smukherjee@commvault.com]
> > Sent: 07 March 2016 PM 04:45
> > To: dev@drill.apache.org
> > Subject: RE: Optimizing SUM(1) query
> >
> > Hi,
> >
> > Extremely sorry for the delayed response!
> >
> > Basically , I am experimenting an Apache Solr plugin for Drill and
> > tried to use Tableau. On intial selection of a "Table" , I see these
> > queries from drill webui profile
> >
> > SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> >
> > SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData`
> > `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0
> >
> > I have a dataset of 2,297,451,8 and when I try access that,the tableau
> screen in kind of stuck in "Processing Request" for 5 minutes.
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> > Sent: 20 February 2016 AM 03:42
> > To: dev@drill.apache.org
> > Subject: Re: Optimizing SUM(1) query
> >
> > What are you trying to do in Tableau that generates these queries?
> >
> > Do you have the actual full query that is being generated?
> >
> > Tableau has a number of customization features, wonder if it is possible
> to optimize those settings to avoid this issue.
> >
> > --Andries
> >
> >
> >> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Have anyone tried optimizing SUM(1) query in drill? Or is it
> implemented?
> >> Getting these query while using Tableau. Mostly probably it is trying
> to figure out NUMBER_OF_RECORDS.
> >>
> >> Thanks,
> >> Sudip
> >>
> >>
> >>
> >> ***************************Legal
> >> Disclaimer***************************
> >> "This communication may contain confidential and privileged material
> >> for the sole use of the intended recipient. Any unauthorized review,
> >> use or distribution by others is strictly prohibited. If you have
> >> received the message by mistake, please advise the sender by reply
> email and delete the message. Thank you."
> >> *********************************************************************
> >> *
> >
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material for
> the sole use of the intended recipient. Any unauthorized review, use or
> distribution by others is strictly prohibited. If you have received the
> message by mistake, please advise the sender by reply email and delete the
> message. Thank you."
> > **********************************************************************
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material
> > for the sole use of the intended recipient. Any unauthorized review,
> > use or distribution by others is strictly prohibited. If you have
> > received the message by mistake, please advise the sender by reply email
> and delete the message. Thank you."
> > **********************************************************************
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>

RE: Optimizing SUM(1) query

Posted by Sudip Mukherjee <sm...@commvault.com>.

I don't see DRILL is transforming the query. Tried with a CSV file.  Please let me know if I am missing something.

00-00    Screen : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.1 rows, 17.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 260
00-01      Project(EXPR$0=[$0]) : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 259
00-02        StreamAgg(group=[{}], EXPR$0=[SUM($0)]) : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 258
00-03          Project($f0=[1]) : rowType = RecordType(INTEGER $f0): rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
00-04            Scan(groupscan=[EasyGroupScan [selectionRoot=file:/C:/data/company.csv, numFiles=1, columns=[`*`], files=[file:/C:/data/company.csv]]]) : rowType = RecordType(): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 256

Thanks,
Sudip

-----Original Message-----
From: Julian Hyde [mailto:jhyde@apache.org] 
Sent: 16 March 2016 AM 12:50
To: dev@drill.apache.org
Subject: Re: Optimizing SUM(1) query

Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at an early stage (i.e. using a logical optimization rule) so that this optimization does not need to be done for each engine?

> On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
> 
> I was trying to have an Optimizer rule for the solr storage plugin that I'm working on for this query. Trying to use SOLR field stats for this , so that the query is faster..
> Getting the below exception while transforming project to scan. Could you please advise?
> 
> 
> 2016-03-15 08:20:35,149 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1 
> cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
>  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
>    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 46
>      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = {40.0 rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
>        DrillScanRel(table=[[solr, ANalert_494]], 
> groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec 
> [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/ 
> filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 26
> 
> 2016-03-15 08:20:35,201 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] 
> DEBUG o.a.drill.exec.work.foreman.Foreman - 
> 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING 
> --> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: index (0) must be less than size (0)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) [drill-java-exec.jar:1.4.0]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_65]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_65]
> 	at java.lang.Thread.run(Unknown Source) [na:1.8.0_65] Caused by: 
> java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[com.google.guava-guava.jar:na]
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[com.google.guava-guava.jar:na]
> 	at com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80) ~[com.google.guava-guava.jar:na]
> 	at org.apache.calcite.util.Pair$6.get(Pair.java:335) ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> 	at org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInProject(StarColumnHelper.java:60) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:138) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.java:69) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:77) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.java:64) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.insertRenameProject(StarColumnConverter.java:72) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:326) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:175) ~[drill-java-exec.jar:1.4.0]
> 	at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorke
> r.java:197) ~[drill-java-exec.jar:1.4.0]
> 
> 
> Thanks,
> Sudip
> 
> -----Original Message-----
> From: Sudip Mukherjee [mailto:smukherjee@commvault.com]
> Sent: 07 March 2016 PM 04:45
> To: dev@drill.apache.org
> Subject: RE: Optimizing SUM(1) query
> 
> Hi,
> 
> Extremely sorry for the delayed response!
> 
> Basically , I am experimenting an Apache Solr plugin for Drill and 
> tried to use Tableau. On intial selection of a "Table" , I see these 
> queries from drill webui profile
> 
> SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> 
> SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` 
> `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0
> 
> I have a dataset of 2,297,451,8 and when I try access that,the tableau screen in kind of stuck in "Processing Request" for 5 minutes.
> 
> Thanks,
> Sudip
> 
> -----Original Message-----
> From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> Sent: 20 February 2016 AM 03:42
> To: dev@drill.apache.org
> Subject: Re: Optimizing SUM(1) query
> 
> What are you trying to do in Tableau that generates these queries?
> 
> Do you have the actual full query that is being generated?
> 
> Tableau has a number of customization features, wonder if it is possible to optimize those settings to avoid this issue.
> 
> --Andries
> 
> 
>> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
>> 
>> Hi,
>> 
>> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
>> Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.
>> 
>> Thanks,
>> Sudip
>> 
>> 
>> 
>> ***************************Legal 
>> Disclaimer***************************
>> "This communication may contain confidential and privileged material 
>> for the sole use of the intended recipient. Any unauthorized review, 
>> use or distribution by others is strictly prohibited. If you have 
>> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
>> *********************************************************************
>> *
> 
> 
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
> 
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************




***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

Re: Optimizing SUM(1) query

Posted by Julian Hyde <jh...@apache.org>.

Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at an early stage (i.e. using a logical optimization rule) so that this optimization does not need to be done for each engine?

> On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
> 
> I was trying to have an Optimizer rule for the solr storage plugin that I'm working on for this query. Trying to use SOLR field stats for this , so that the query is faster..
> Getting the below exception while transforming project to scan. Could you please advise?
> 
> 
> 2016-03-15 08:20:35,149 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
>  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
>    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 46
>      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = {40.0 rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
>        DrillScanRel(table=[[solr, ANalert_494]], groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/ filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]): rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26
> 
> 2016-03-15 08:20:35,201 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] DEBUG o.a.drill.exec.work.foreman.Foreman - 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING --> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: index (0) must be less than size (0)
> 	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) [drill-java-exec.jar:1.4.0]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_65]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_65]
> 	at java.lang.Thread.run(Unknown Source) [na:1.8.0_65]
> Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[com.google.guava-guava.jar:na]
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[com.google.guava-guava.jar:na]
> 	at com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80) ~[com.google.guava-guava.jar:na]
> 	at org.apache.calcite.util.Pair$6.get(Pair.java:335) ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> 	at org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInProject(StarColumnHelper.java:60) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:138) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.java:69) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:77) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.java:64) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.insertRenameProject(StarColumnConverter.java:72) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:326) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:175) ~[drill-java-exec.jar:1.4.0]
> 	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:197) ~[drill-java-exec.jar:1.4.0]
> 
> 
> Thanks,
> Sudip
> 
> -----Original Message-----
> From: Sudip Mukherjee [mailto:smukherjee@commvault.com] 
> Sent: 07 March 2016 PM 04:45
> To: dev@drill.apache.org
> Subject: RE: Optimizing SUM(1) query
> 
> Hi,
> 
> Extremely sorry for the delayed response!
> 
> Basically , I am experimenting an Apache Solr plugin for Drill and tried to use Tableau. On intial selection of a "Table" , I see these queries from drill webui profile
> 
> SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> 
> SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0 
> 
> I have a dataset of 2,297,451,8 and when I try access that,the tableau screen in kind of stuck in "Processing Request" for 5 minutes.
> 
> Thanks,
> Sudip
> 
> -----Original Message-----
> From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> Sent: 20 February 2016 AM 03:42
> To: dev@drill.apache.org
> Subject: Re: Optimizing SUM(1) query
> 
> What are you trying to do in Tableau that generates these queries?
> 
> Do you have the actual full query that is being generated?
> 
> Tableau has a number of customization features, wonder if it is possible to optimize those settings to avoid this issue.
> 
> --Andries
> 
> 
>> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
>> 
>> Hi,
>> 
>> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
>> Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.
>> 
>> Thanks,
>> Sudip
>> 
>> 
>> 
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material 
>> for the sole use of the intended recipient. Any unauthorized review, 
>> use or distribution by others is strictly prohibited. If you have 
>> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
>> **********************************************************************
> 
> 
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
> 
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or distribution
> by others is strictly prohibited. If you have received the message by mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************

RE: Optimizing SUM(1) query

Posted by Sudip Mukherjee <sm...@commvault.com>.

I was trying to have an Optimizer rule for the solr storage plugin that I'm working on for this query. Trying to use SOLR field stats for this , so that the query is faster..
Getting the below exception while transforming project to scan. Could you please advise?


2016-03-15 08:20:35,149 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0, cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 46
      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = {40.0 rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
        DrillScanRel(table=[[solr, ANalert_494]], groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/ filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]): rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26

2016-03-15 08:20:35,201 [291801ee-33fc-064d-7aff-18391f15ae0e:foreman] DEBUG o.a.drill.exec.work.foreman.Foreman - 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING --> FAILED
org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: index (0) must be less than size (0)
	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) [drill-java-exec.jar:1.4.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_65]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_65]
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[com.google.guava-guava.jar:na]
	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[com.google.guava-guava.jar:na]
	at com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80) ~[com.google.guava-guava.jar:na]
	at org.apache.calcite.util.Pair$6.get(Pair.java:335) ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
	at org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInProject(StarColumnHelper.java:60) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:138) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitProject(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.java:69) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:77) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.visitScreen(StarColumnConverter.java:45) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.java:64) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.insertRenameProject(StarColumnConverter.java:72) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:326) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:175) ~[drill-java-exec.jar:1.4.0]
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:197) ~[drill-java-exec.jar:1.4.0]


Thanks,
Sudip

-----Original Message-----
From: Sudip Mukherjee [mailto:smukherjee@commvault.com] 
Sent: 07 March 2016 PM 04:45
To: dev@drill.apache.org
Subject: RE: Optimizing SUM(1) query

Hi,

Extremely sorry for the delayed response!

Basically , I am experimenting an Apache Solr plugin for Drill and tried to use Tableau. On intial selection of a "Table" , I see these queries from drill webui profile

SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0

SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0 

I have a dataset of 2,297,451,8 and when I try access that,the tableau screen in kind of stuck in "Processing Request" for 5 minutes.

Thanks,
Sudip

-----Original Message-----
From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
Sent: 20 February 2016 AM 03:42
To: dev@drill.apache.org
Subject: Re: Optimizing SUM(1) query

What are you trying to do in Tableau that generates these queries?

Do you have the actual full query that is being generated?

Tableau has a number of customization features, wonder if it is possible to optimize those settings to avoid this issue.

--Andries


> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
> 
> Hi,
> 
> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
> Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.
> 
> Thanks,
> Sudip
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************





***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
**********************************************************************




***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

RE: Optimizing SUM(1) query

Posted by Sudip Mukherjee <sm...@commvault.com>.

Hi,

Extremely sorry for the delayed response!

Basically , I am experimenting an Apache Solr plugin for Drill and tried to use Tableau. On intial selection of a "Table" , I see these queries from drill webui profile

SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0

SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0 

I have a dataset of 2,297,451,8 and when I try access that,the tableau screen in kind of stuck in "Processing Request" for 5 minutes.

Thanks,
Sudip

-----Original Message-----
From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com] 
Sent: 20 February 2016 AM 03:42
To: dev@drill.apache.org
Subject: Re: Optimizing SUM(1) query

What are you trying to do in Tableau that generates these queries?

Do you have the actual full query that is being generated?

Tableau has a number of customization features, wonder if it is possible to optimize those settings to avoid this issue.

--Andries

> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
> 
> Hi,
> 
> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
> Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.
> 
> Thanks,
> Sudip
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************

***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

Re: Optimizing SUM(1) query

Posted by Andries Engelbrecht <ae...@maprtech.com>.

What are you trying to do in Tableau that generates these queries?

Do you have the actual full query that is being generated?

Tableau has a number of customization features, wonder if it is possible to optimize those settings to avoid this issue.

--Andries


> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee <sm...@commvault.com> wrote:
> 
> Hi,
> 
> Have anyone tried optimizing SUM(1) query in drill? Or is it implemented?
> Getting these query while using Tableau. Mostly probably it is trying to figure out NUMBER_OF_RECORDS.
> 
> Thanks,
> Sudip
> 
> 
> 
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or distribution
> by others is strictly prohibited. If you have received the message by mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************