You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by 祁明良 <mq...@xiaohongshu.com> on 2018/12/09 10:20:33 UTC

Question regarding rescale api

Hi All,

I see the rescale api allow us to somehow redistribute element locally, but is it possible to make the upstream operator distributed evenly on task managers?
For example I have 10 task managers each with 10 slots. The application reads data from Kafka topic with 20 partitions, then rescale it to full parallelism. To me it seems that the 20 slots needed to read from Kafka won’t distributed evenly on 10 task managers, which means further rescale still needs to shuffle data over network.


Best,
Mingliang

本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.

Re: Question regarding rescale api

Posted by Till Rohrmann <tr...@apache.org>.
Hi Mingliang,

Aljoscha is right. At the moment Flink does not support to spread out tasks
across all TaskManagers. This is a feature which we still need to add.
Until then, you need to set the parallelism to the number of available
slots in order to guarantee that all TaskManagers are equally used.

Cheers,
Till

On Mon, Dec 10, 2018 at 3:18 PM Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
>
> I think with how currently the assignment of tasks to slots works there is
> no way of ensuring that the source tasks are evenly spread to the
> TaskManagers (TaskExecutors). The rescale() API is from a time where
> scheduling worked a bit different in Flink, I'm afraid.
>
> I'm cc'ing Till, who might know more about scheduling.
>
> Best,
> Aljoscha
>
>
> On 10. Dec 2018, at 13:02, 祁明良 <mq...@xiaohongshu.com> wrote:
>
>
> Hi Aljoscha,
>
> Seems you are the committer of rescale api, any help about this question?
>
> Best,
> Mingliang
>
> ------------------------------
> *发件人:* 祁明良
> *发送时间:* 2018年12月9日 18:20
> *收件人:* user@flink.apache.org
> *主题:* Question regarding rescale api
>
> Hi All,
>
> I see the rescale api allow us to somehow redistribute element locally,
> but is it possible to make the upstream operator distributed evenly on task
> managers?
> For example I have 10 task managers each with 10 slots. The application
> reads data from Kafka topic with 20 partitions, then rescale it to full
> parallelism. To me it seems that the 20 slots needed to read from Kafka
> won’t distributed evenly on 10 task managers, which means further rescale
> still needs to shuffle data over network.
>
>
> Best,
> Mingliang
>
>
> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
>
> This communication may contain privileged or other confidential
> information of Red. If you have received it in error, please advise the
> sender by reply e-mail and immediately delete the message and any
> attachments without copying or disclosing the contents. Thank you.
>
>
>

Re: Question regarding rescale api

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

I think with how currently the assignment of tasks to slots works there is no way of ensuring that the source tasks are evenly spread to the TaskManagers (TaskExecutors). The rescale() API is from a time where scheduling worked a bit different in Flink, I'm afraid.

I'm cc'ing Till, who might know more about scheduling.

Best,
Aljoscha


> On 10. Dec 2018, at 13:02, 祁明良 <mq...@xiaohongshu.com> wrote:
> 
> 
> Hi Aljoscha,
> 
> Seems you are the committer of rescale api, any help about this question?
> 
> Best,
> Mingliang
> 
> 发件人: 祁明良
> 发送时间: 2018年12月9日 18:20
> 收件人: user@flink.apache.org
> 主题: Question regarding rescale api
>  
> Hi All,
> 
> I see the rescale api allow us to somehow redistribute element locally, but is it possible to make the upstream operator distributed evenly on task managers?
> For example I have 10 task managers each with 10 slots. The application reads data from Kafka topic with 20 partitions, then rescale it to full parallelism. To me it seems that the 20 slots needed to read from Kafka won’t distributed evenly on 10 task managers, which means further rescale still needs to shuffle data over network.
> 
> 
> Best,
> Mingliang
> 
> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! 
> This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.