You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/06 01:49:25 UTC

[jira] [Commented] (DRILL-4446) Improve current fragment parallelization module

    [ https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227380#comment-15227380 ] 

ASF GitHub Bot commented on DRILL-4446:
---------------------------------------

Github user jacques-n commented on the pull request:

    https://github.com/apache/drill/pull/403#issuecomment-206039007
  
    This looks good to me. Let's get this merged. +1


> Improve current fragment parallelization module
> -----------------------------------------------
>
>                 Key: DRILL-4446
>                 URL: https://issues.apache.org/jira/browse/DRILL-4446
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.5.0
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>             Fix For: 1.7.0
>
>
> Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle correctly the case where an operator has mandatory scheduling requirement for a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how much portion of the total tasks to be scheduled on each DrillbitEndpoint). It assumes that scheduling requirements are soft (except one case where Mux and DeMux case where mandatory parallelization requirement of 1 unit). 
> An example is:
> Cluster has 3 nodes running Drillbits and storage service on each. Data for a table is only present at storage services in two nodes. So a GroupScan needs to be scheduled on these two nodes in order to read the data. Storage service doesn't support (or costly) reading data from remote node.
> Inserting the mandatory scheduling requirements within existing SimpleParallelizer is not sufficient as you may end up with a plan that has a fragment with two GroupScans each having its own hard parallelization requirements.
> Proposal is:
> Add a property to each operator which tells what parallelization implementation to use. Most operators don't have any particular strategy (such as Project or Filter), they depend on incoming operator. Current existing operators which have requirements (all existing GroupScans) default to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new mandatory assignment parallelizer. It is possible that PhysicalPlan generated can have a fragment with operators having different parallelization strategies. In that case an exchange is inserted in between operators where a change in parallelization strategy is required.
> Will send a detailed design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)