You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/05/10 15:03:38 UTC

[GitHub] [flink] fhueske commented on a change in pull request #8330: [FLINK-12388][docs] Update the production readiness checklist

fhueske commented on a change in pull request #8330: [FLINK-12388][docs] Update the production readiness checklist
URL: https://github.com/apache/flink/pull/8330#discussion_r282921904

##########
File path: docs/ops/production_ready.md
##########
@@ -22,79 +22,54 @@ specific language governing permissions and limitations
under the License.
-->

+The production readiness checklist provides an overview of configuration options that should be carefully considered before bringing an Apache Flink job into production.
+While the Flink community has attempted to provide sensible defaults for each configuration, it is important to review this list and ensure the options chosen are sufficient for your needs.
+
* ToC
{:toc}

-## Production Readiness Checklist
-
-Purpose of this production readiness checklist is to provide a condensed overview of configuration options that are
-important and need **careful considerations** if you plan to bring your Flink job into **production**. For most of these options
-Flink provides out-of-the-box defaults to make usage and adoption of Flink easier. For many users and scenarios, those
-defaults are good starting points for development and completely sufficient for "one-shot" jobs.
-
-However, once you are planning to bring a Flink application to production the requirements typically increase. For example,
-you want your job to be (re-)scalable and to have a good upgrade story for your job and new Flink versions.
-
-In the following, we present a collection of configuration options that you should check before your job goes into production.
-
-### Set maximum parallelism for operators explicitly
-
-Maximum parallelism is a configuration parameter that is newly introduced in Flink 1.2 and has important implications
-for the (re-)scalability of your Flink job. This parameter, which can be set on a per-job and/or per-operator granularity,
-determines the maximum parallelism to which you can scale operators. It is important to understand that (as of now) there
-is **no way to change** this parameter after your job has been started, except for restarting your job completely
-from scratch (i.e. with a new state, and not from a previous checkpoint/savepoint). Even if Flink would provide some way
-to change maximum parallelism for existing savepoints in the future, you can already assume that for large states this is
-likely a long running operation that you want to avoid. At this point, you might wonder why not just to use a very high
-value as default for this parameter. The reason behind this is that high maximum parallelism can have some impact on your
-application's performance and even state sizes, because Flink has to maintain certain metadata for its ability to rescale which
-can increase with the maximum parallelism. In general, you should choose a max parallelism that is high enough to fit your
-future needs in scalability, but keeping it as low as possible can give slightly better performance. In particular,
-a maximum parallelism higher that 128 will typically result in slightly bigger state snapshots from the keyed backends.
+### Set An Explicit Max Parallelism

-Notice that maximum parallelism must fulfill the following conditions:
+The max parallelism, set on a per-job and per-operator granularity, determines the maximum parallelism to which a stateful operator can scale.
+There is currently **no way to change** the maximum parallelism of an operator after a job has started without discarding that operators state.

Review comment:
un-highlight "no way to change"? The other aspects have similar issues and are not highlighted.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services