You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2015/03/12 10:20:50 UTC

Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other

+1 for going consistently with parallelism. However, these are API-breaking
changes and we need to mark them deprecated before throwing them out, IMO.

I am not comfortable with using AUTOMAX as a default. This is fine on
dedicated setups like YARN sessions, but will consume all available
resources of a cluster if a user forgets to set the -p flag (or fix the DOP
in the program). There is already a default-parallelsm flag in the config
and that value should be used, IMO.

2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <ji...@apache.org>:

>
>     [
> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
> ]
>
> Robert Metzger commented on FLINK-1679:
> ---------------------------------------
>
> I would suggest to remove all occurrences of "degreeOfParalleism" in the
> system and replace it by "parallelism" everywhere.
> The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
>
> I would also suggest to set the parallelism by default to {{AUTOMAX}} in
> the CliFrontend.
>
> > Document how "degree of parallelism" /  "parallelism" / "slots" are
> connected to each other
> >
> -------------------------------------------------------------------------------------------
> >
> >                 Key: FLINK-1679
> >                 URL: https://issues.apache.org/jira/browse/FLINK-1679
> >             Project: Flink
> >          Issue Type: Task
> >          Components: Documentation
> >    Affects Versions: 0.9
> >            Reporter: Robert Metzger
> >            Assignee: Ufuk Celebi
> >
> > I see too many users being confused about properly setting up Flink with
> respect to parallelism.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other

Posted by Stephan Ewen <se...@apache.org>.

+1 for consistently calling it parallelism

-1 for AUTOMAX as the default

On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rm...@apache.org>
wrote:

> We can also make the change non-API breaking by adding an additional method
> and deprecating the old one.
>
>
> Why would the AUTOMAX parallelism eat up all cluster resources? It would
> only allocate all slots WITHIN the Flink cluster.
> Those users (=new users) who would benefit from the AUTOMAX parallelism
> have probably set the parallelism per TaskManager set to 1 anyways.
> Advanced users will set their parallelism / slots configuration anyways
> properly.
>
> In my experience, most users:
> - have exclusive access to a test cluster in the beginning (I don't think
> anybody who doesn't know the system at all would start Flink on a
> production cluster)
> - or use YARN
> - do not set any parallelism for jobs or slots per TaskManager.
>
> From these observations, I would actually set the number of slots on the
> TaskManagers to the number of available CPUs.
> And for the CLI frontend, I would by default let a job use all available
> slots (most users don't know that Flink allows to run multiple jobs at the
> same time).
>
> If users want to change the behavior, they have to look into the
> documentation.
>
> On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > +1 for going consistently with parallelism. However, these are
> API-breaking
> > changes and we need to mark them deprecated before throwing them out,
> IMO.
> >
> > I am not comfortable with using AUTOMAX as a default. This is fine on
> > dedicated setups like YARN sessions, but will consume all available
> > resources of a cluster if a user forgets to set the -p flag (or fix the
> DOP
> > in the program). There is already a default-parallelsm flag in the config
> > and that value should be used, IMO.
> >
> > 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <ji...@apache.org>:
> >
> > >
> > >     [
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
> > > ]
> > >
> > > Robert Metzger commented on FLINK-1679:
> > > ---------------------------------------
> > >
> > > I would suggest to remove all occurrences of "degreeOfParalleism" in
> the
> > > system and replace it by "parallelism" everywhere.
> > > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
> > >
> > > I would also suggest to set the parallelism by default to {{AUTOMAX}}
> in
> > > the CliFrontend.
> > >
> > > > Document how "degree of parallelism" /  "parallelism" / "slots" are
> > > connected to each other
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------
> > > >
> > > >                 Key: FLINK-1679
> > > >                 URL:
> https://issues.apache.org/jira/browse/FLINK-1679
> > > >             Project: Flink
> > > >          Issue Type: Task
> > > >          Components: Documentation
> > > >    Affects Versions: 0.9
> > > >            Reporter: Robert Metzger
> > > >            Assignee: Ufuk Celebi
> > > >
> > > > I see too many users being confused about properly setting up Flink
> > with
> > > respect to parallelism.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other

Posted by Stephan Ewen <se...@apache.org>.

YARN does not have that problem anyways, because YARN sets the default
parallelism to all slots anyways


On Thu, Mar 12, 2015 at 11:19 AM, Maximilian Michels <mx...@apache.org> wrote:

> +1 for unifying the way to set the parallelism and deprecating the old
> methods.
>
> We had the AUTOMAX discussion before in the corresponding pull
> request. It seems to be that there are two orthogonal views on how
> resources should be allocated by default. I strongly agree with
> Robert.
>
> Users have exclusive access to resources or use a resource manager
> (YARN). They are often unaware of the parallelism and are turned off
> by the bad performance with parallelism of 1. Setting AUTOMAX by
> default gives the best possible Flink experience. After all, Flink
> doesn't even support proper sharing of resources at the moment. So
> scenarios where multiple users manually set the parallelism will cause
> problems with job canceling due to unavailable resources and missing
> queuing features.
>
> Let's leave it up to the advanced users to set the granularity of the
> parallelism and provide the best out of the box experience for Flink
> novices.
>
> Best regards,
> Max
>
> On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rm...@apache.org>
> wrote:
> > We can also make the change non-API breaking by adding an additional
> method
> > and deprecating the old one.
> >
> >
> > Why would the AUTOMAX parallelism eat up all cluster resources? It would
> > only allocate all slots WITHIN the Flink cluster.
> > Those users (=new users) who would benefit from the AUTOMAX parallelism
> > have probably set the parallelism per TaskManager set to 1 anyways.
> > Advanced users will set their parallelism / slots configuration anyways
> > properly.
> >
> > In my experience, most users:
> > - have exclusive access to a test cluster in the beginning (I don't think
> > anybody who doesn't know the system at all would start Flink on a
> > production cluster)
> > - or use YARN
> > - do not set any parallelism for jobs or slots per TaskManager.
> >
> > From these observations, I would actually set the number of slots on the
> > TaskManagers to the number of available CPUs.
> > And for the CLI frontend, I would by default let a job use all available
> > slots (most users don't know that Flink allows to run multiple jobs at
> the
> > same time).
> >
> > If users want to change the behavior, they have to look into the
> > documentation.
> >
> > On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> >> +1 for going consistently with parallelism. However, these are
> API-breaking
> >> changes and we need to mark them deprecated before throwing them out,
> IMO.
> >>
> >> I am not comfortable with using AUTOMAX as a default. This is fine on
> >> dedicated setups like YARN sessions, but will consume all available
> >> resources of a cluster if a user forgets to set the -p flag (or fix the
> DOP
> >> in the program). There is already a default-parallelsm flag in the
> config
> >> and that value should be used, IMO.
> >>
> >> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <ji...@apache.org>:
> >>
> >> >
> >> >     [
> >> >
> >>
> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
> >> > ]
> >> >
> >> > Robert Metzger commented on FLINK-1679:
> >> > ---------------------------------------
> >> >
> >> > I would suggest to remove all occurrences of "degreeOfParalleism" in
> the
> >> > system and replace it by "parallelism" everywhere.
> >> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
> >> >
> >> > I would also suggest to set the parallelism by default to {{AUTOMAX}}
> in
> >> > the CliFrontend.
> >> >
> >> > > Document how "degree of parallelism" /  "parallelism" / "slots" are
> >> > connected to each other
> >> > >
> >> >
> >>
> -------------------------------------------------------------------------------------------
> >> > >
> >> > >                 Key: FLINK-1679
> >> > >                 URL:
> https://issues.apache.org/jira/browse/FLINK-1679
> >> > >             Project: Flink
> >> > >          Issue Type: Task
> >> > >          Components: Documentation
> >> > >    Affects Versions: 0.9
> >> > >            Reporter: Robert Metzger
> >> > >            Assignee: Ufuk Celebi
> >> > >
> >> > > I see too many users being confused about properly setting up Flink
> >> with
> >> > respect to parallelism.
> >> >
> >> >
> >> >
> >> > --
> >> > This message was sent by Atlassian JIRA
> >> > (v6.3.4#6332)
> >> >
> >>
>

Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other

Posted by Maximilian Michels <mx...@apache.org>.

+1 for unifying the way to set the parallelism and deprecating the old methods.

We had the AUTOMAX discussion before in the corresponding pull
request. It seems to be that there are two orthogonal views on how
resources should be allocated by default. I strongly agree with
Robert.

Users have exclusive access to resources or use a resource manager
(YARN). They are often unaware of the parallelism and are turned off
by the bad performance with parallelism of 1. Setting AUTOMAX by
default gives the best possible Flink experience. After all, Flink
doesn't even support proper sharing of resources at the moment. So
scenarios where multiple users manually set the parallelism will cause
problems with job canceling due to unavailable resources and missing
queuing features.

Let's leave it up to the advanced users to set the granularity of the
parallelism and provide the best out of the box experience for Flink
novices.

Best regards,
Max

On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rm...@apache.org> wrote:
> We can also make the change non-API breaking by adding an additional method
> and deprecating the old one.
>
>
> Why would the AUTOMAX parallelism eat up all cluster resources? It would
> only allocate all slots WITHIN the Flink cluster.
> Those users (=new users) who would benefit from the AUTOMAX parallelism
> have probably set the parallelism per TaskManager set to 1 anyways.
> Advanced users will set their parallelism / slots configuration anyways
> properly.
>
> In my experience, most users:
> - have exclusive access to a test cluster in the beginning (I don't think
> anybody who doesn't know the system at all would start Flink on a
> production cluster)
> - or use YARN
> - do not set any parallelism for jobs or slots per TaskManager.
>
> From these observations, I would actually set the number of slots on the
> TaskManagers to the number of available CPUs.
> And for the CLI frontend, I would by default let a job use all available
> slots (most users don't know that Flink allows to run multiple jobs at the
> same time).
>
> If users want to change the behavior, they have to look into the
> documentation.
>
> On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> +1 for going consistently with parallelism. However, these are API-breaking
>> changes and we need to mark them deprecated before throwing them out, IMO.
>>
>> I am not comfortable with using AUTOMAX as a default. This is fine on
>> dedicated setups like YARN sessions, but will consume all available
>> resources of a cluster if a user forgets to set the -p flag (or fix the DOP
>> in the program). There is already a default-parallelsm flag in the config
>> and that value should be used, IMO.
>>
>> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <ji...@apache.org>:
>>
>> >
>> >     [
>> >
>> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
>> > ]
>> >
>> > Robert Metzger commented on FLINK-1679:
>> > ---------------------------------------
>> >
>> > I would suggest to remove all occurrences of "degreeOfParalleism" in the
>> > system and replace it by "parallelism" everywhere.
>> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
>> >
>> > I would also suggest to set the parallelism by default to {{AUTOMAX}} in
>> > the CliFrontend.
>> >
>> > > Document how "degree of parallelism" /  "parallelism" / "slots" are
>> > connected to each other
>> > >
>> >
>> -------------------------------------------------------------------------------------------
>> > >
>> > >                 Key: FLINK-1679
>> > >                 URL: https://issues.apache.org/jira/browse/FLINK-1679
>> > >             Project: Flink
>> > >          Issue Type: Task
>> > >          Components: Documentation
>> > >    Affects Versions: 0.9
>> > >            Reporter: Robert Metzger
>> > >            Assignee: Ufuk Celebi
>> > >
>> > > I see too many users being confused about properly setting up Flink
>> with
>> > respect to parallelism.
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v6.3.4#6332)
>> >
>>

Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other

Posted by Robert Metzger <rm...@apache.org>.

We can also make the change non-API breaking by adding an additional method
and deprecating the old one.


Why would the AUTOMAX parallelism eat up all cluster resources? It would
only allocate all slots WITHIN the Flink cluster.
Those users (=new users) who would benefit from the AUTOMAX parallelism
have probably set the parallelism per TaskManager set to 1 anyways.
Advanced users will set their parallelism / slots configuration anyways
properly.

In my experience, most users:
- have exclusive access to a test cluster in the beginning (I don't think
anybody who doesn't know the system at all would start Flink on a
production cluster)
- or use YARN
- do not set any parallelism for jobs or slots per TaskManager.

>From these observations, I would actually set the number of slots on the
TaskManagers to the number of available CPUs.
And for the CLI frontend, I would by default let a job use all available
slots (most users don't know that Flink allows to run multiple jobs at the
same time).

If users want to change the behavior, they have to look into the
documentation.

On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for going consistently with parallelism. However, these are API-breaking
> changes and we need to mark them deprecated before throwing them out, IMO.
>
> I am not comfortable with using AUTOMAX as a default. This is fine on
> dedicated setups like YARN sessions, but will consume all available
> resources of a cluster if a user forgets to set the -p flag (or fix the DOP
> in the program). There is already a default-parallelsm flag in the config
> and that value should be used, IMO.
>
> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <ji...@apache.org>:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
> > ]
> >
> > Robert Metzger commented on FLINK-1679:
> > ---------------------------------------
> >
> > I would suggest to remove all occurrences of "degreeOfParalleism" in the
> > system and replace it by "parallelism" everywhere.
> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
> >
> > I would also suggest to set the parallelism by default to {{AUTOMAX}} in
> > the CliFrontend.
> >
> > > Document how "degree of parallelism" /  "parallelism" / "slots" are
> > connected to each other
> > >
> >
> -------------------------------------------------------------------------------------------
> > >
> > >                 Key: FLINK-1679
> > >                 URL: https://issues.apache.org/jira/browse/FLINK-1679
> > >             Project: Flink
> > >          Issue Type: Task
> > >          Components: Documentation
> > >    Affects Versions: 0.9
> > >            Reporter: Robert Metzger
> > >            Assignee: Ufuk Celebi
> > >
> > > I see too many users being confused about properly setting up Flink
> with
> > respect to parallelism.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>