You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2014/06/22 23:51:25 UTC

[jira] [Resolved] (FLINK-33) [GitHub] Rework instance configuration.

     [ https://issues.apache.org/jira/browse/FLINK-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Ewen resolved FLINK-33.
-------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: pre-apache)
                   0.6-incubating

This is fixed in b4b633eab9a70e14d2e0dd5252f4b092a3689093

> [GitHub] Rework instance configuration.
> ---------------------------------------
>
>                 Key: FLINK-33
>                 URL: https://issues.apache.org/jira/browse/FLINK-33
>             Project: Flink
>          Issue Type: Bug
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: 0.6-incubating
>
>
> Right now, Nephele still uses the EC2-inspired instance configuration model. The Pact compiler connects to obtain information about these instances, such as how many are available, and how much memory they have. This is error prone to configure and also a bit buggy, it frequently leads to wrong memory bookkeeping if different instance types are configured.
> Do we need support for heterogeneous setups where different nodes have different capabilities and should be assigned a different amount of work? If we defer this to later, we can greatly simplify the logic and configuration:
> 1) No configuration for the instance type. The internal instance manager has a default profile which is okay for all cluster instances.
> 2) An explicit value of how many slots for parallel operators we have on each node (such as 8 on an eight core machine). There should be a default value in the config which could be overridden via query-specific parameters.
> 3) An explicit config entry that defines how much memory should be used for networking and how much for query processing. The query processing memory amount is used to initialize the MemoryManager and is also used by the pact-compiler to parameterize the memory available to the operators. That way we can also get rid of the communication between the compiler and the job manager on plan compilation. Eventually it would be good to run the compiler as a child process of the job-manager anyways.
> In the long run we want to make query processing memory and network memory one value (overall system memory, the rest is the UDF Java heap memory) which is shared for materialization in the network stack and the runtime operators.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/33
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: 
> Created at: Wed Jun 12 02:58:27 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)