You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chunhui Shi (JIRA)" <ji...@apache.org> on 2017/01/05 19:26:58 UTC
[jira] [Commented] (DRILL-5177) Query Planning takes infinite time in case drill is connected to Mongo Sharded environment

    [ https://issues.apache.org/jira/browse/DRILL-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802274#comment-15802274 ] 

Chunhui Shi commented on DRILL-5177:
------------------------------------

Not sure if this is related(https://issues.apache.org/jira/browse/DRILL-4882). You may want to give it a try. There is no much difference in planning stage for MongoDB, except the filter push down optimize rule. So it could be either Drill has difficulty to get schema information from MongoDB(due to DRILL-4882) or other storage plugins(DRILL-4882), or the filter push down itself is slow. It will be easier for us to look at this issue if we have more information, e.g. debug level drillbit.log, profile of this query, or the repro steps, including configurations, e.g. mongoDB configurations, and mongo storage plugins configs in Drill.

> Query Planning takes infinite time in case drill is connected to  Mongo Sharded environment
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5177
>                 URL: https://issues.apache.org/jira/browse/DRILL-5177
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - MongoDB
>    Affects Versions: 1.8.0
>         Environment: 1) There were 4 drillbits (with 3 zookeeper) and 4 mongod’s in the cluster. However the drillbits and mongod’s were not located on the same physical server.
> 2) The shard key was evenly distributed among 4 shards (mongod)
>            Reporter: Mridul Chopra
>
> When Drill is connected to Sharded Mongo environment (mongoS), then the query execution time is very high as compared to query execution time on mongod (even though the volume of data on mongoS and mongoD is almost same). The root cause behind the same can be linked with the query planning time.
> On MongoS
> Collection Size : - 200 GB, Record Count  : 230,083,160
> A simple select query with a filter on indexed column was executed, but then the query was under execution for more than 50 minutes. The query state was "STARTING" until 40 minutes. Upon further analysis, it was revealed that query planning took very long. 
> Below are the details where this issue was localised -
> Class Name : DefaultSqlHandler.java
> Method Name : protected RelNode transform(PlannerType plannerType, PlannerPhase phase, RelNode input, RelTraitSet targetTraits,
>       boolean log) 
> Line No : 384 :    output = program.run(planner, input, toTraits);
> The output from the above line is returned by VolcanoPlanner class 
> (package: org.apache.calcite.plan.volcano) which takes huge time for query planning. This is only in case of MongoS environment.
> When the same select query was executed on MongoD environment
> CollectionSize: 306 GB  Record Count : 49,924,351
> Query execution was completed within 2 minutes and above line returned the output within seconds.
> Given that the data volume was high (300 GB) on mongoD as compared to MongoS(200GB), but the query planning was much faster on MongoD. There seems to be some issue with query planning for MongoS environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)