You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Hubert Zhang <hz...@pivotal.io> on 2016/07/14 14:13:08 UTC

How about handle stages with different strategies

Hi, all
  In HAWQ, different stages in HAWQ will be treated as the same. no matter
from the scheduler view or consider the number of processes.
  But in some other systems like Presto, There are two schedulers, one is
sourcePartitionedScheduler used to dispatch scan stage, the other is
FixedCountScheduler, used to dispatch intermediate stages.
  I think that one is more flexible. Flexible means that we can write a new
scanScheduler, which dispatches at split level, for some nodes, which are
faster than others, will scan more splits than others. This strategy may
reduce the average IO time.
   Is there any suggestion?

-- 
Thanks

Hubert Zhang

Re: How about handle stages with different strategies

Posted by Lili Ma <li...@apache.org>.
Hi Hubert,

Have some questions about your solution. What you mean is that we shall
have two schedulers, one for slices including scan operator , and the other
for slices not including scan operator?  Then for one query, the two
schedulers will co-work?
I guess what you suggest is that we can assign scan tasks according to the
different type of nodes, say, the disk IO ability for different nodes are
not same, right?  And another possible benefit is that we can different
virtual segments for scan slices and not-scan slices, right? I think the
second one can be converted to M*N dispatching support, say, different
slices can have different virtual segments.

Thanks
Lili

2016-07-14 22:13 GMT+08:00 Hubert Zhang <hz...@pivotal.io>:

> Hi, all
>   In HAWQ, different stages in HAWQ will be treated as the same. no matter
> from the scheduler view or consider the number of processes.
>   But in some other systems like Presto, There are two schedulers, one is
> sourcePartitionedScheduler used to dispatch scan stage, the other is
> FixedCountScheduler, used to dispatch intermediate stages.
>   I think that one is more flexible. Flexible means that we can write a new
> scanScheduler, which dispatches at split level, for some nodes, which are
> faster than others, will scan more splits than others. This strategy may
> reduce the average IO time.
>    Is there any suggestion?
>
> --
> Thanks
>
> Hubert Zhang
>