You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Mridul Chopra (JIRA)" <ji...@apache.org> on 2017/01/05 03:37:58 UTC

[jira] [Created] (DRILL-5177) Query Planning takes infinite time in case drill is connected to Mongo Sharded environment

Mridul Chopra created DRILL-5177:
------------------------------------

             Summary: Query Planning takes infinite time in case drill is connected to  Mongo Sharded environment
                 Key: DRILL-5177
                 URL: https://issues.apache.org/jira/browse/DRILL-5177
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization, Storage - MongoDB
    Affects Versions: 1.8.0
         Environment: 1) There were 4 drillbits (with 3 zookeeper) and 4 mongod’s in the cluster. However the drillbits and mongod’s were not located on the same physical server.
2) The shard key was evenly distributed among 4 shards (mongod)
            Reporter: Mridul Chopra


When Drill is connected to Sharded Mongo environment (mongoS), then the query execution time is very high as compared to query execution time on mongod (even though the volume of data on mongoS and mongoD is almost same). The root cause behind the same can be linked with the query planning time.

On MongoS
Collection Size : - 200 GB, Record Count  : 230,083,160
A simple select query with a filter on indexed column was executed, but then the query was under execution for more than 50 minutes. The query state was "STARTING" until 40 minutes. Upon further analysis, it was revealed that query planning took very long. 

Below are the details where this issue was localised -

Class Name : DefaultSqlHandler.java
Method Name : protected RelNode transform(PlannerType plannerType, PlannerPhase phase, RelNode input, RelTraitSet targetTraits,
      boolean log) 
Line No : 384 :    output = program.run(planner, input, toTraits);

The output from the above line is returned by VolcanoPlanner class 
(package: org.apache.calcite.plan.volcano) which takes huge time for query planning. This is only in case of MongoS environment.

When the same select query was executed on MongoD environment
CollectionSize: 306 GB  Record Count : 49,924,351
Query execution was completed within 2 minutes and above line returned the output within seconds.

Given that the data volume was high (300 GB) on mongoD as compared to MongoS(200GB), but the query planning was much faster on MongoD. There seems to be some issue with query planning for MongoS environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)