You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (JIRA)" <ji...@apache.org> on 2018/09/12 10:16:00 UTC
[jira] [Commented] (FLINK-10320) Introduce JobMaster schedule micro-benchmark

    [ https://issues.apache.org/jira/browse/FLINK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611879#comment-16611879 ] 

Piotr Nowojski commented on FLINK-10320:
----------------------------------------

[~Tison] thanks for the proposal. Is the scheduling time a problem? Is it time critical thing? I guess that at least in most cases/scenarios/setups no. Did we have some performance regression there? I'm asking because more benchmarks is more code to maintain and more time to execute them.

My main concern regarding implementation are those mock/testing {{TaskExecutor}}. It would be best to either reuse some existing testing code or just setup the real ones, but if we want to benchmark scheduling for setups with hundreds of TM, they have to be super quick/efficient, other way we would overload the machine executing the benchmark.

> Introduce JobMaster schedule micro-benchmark
> --------------------------------------------
>
>                 Key: FLINK-10320
>                 URL: https://issues.apache.org/jira/browse/FLINK-10320
>             Project: Flink
>          Issue Type: Improvement
>          Components: Tests
>            Reporter: 陈梓立
>            Assignee: 陈梓立
>            Priority: Major
>
> Based on {{org.apache.flink.streaming.runtime.io.benchmark}} stuff and the repo [flink-benchmark|https://github.com/dataArtisans/flink-benchmarks], I proposal to introduce another micro-benchmark which focuses on {{JobMaster}} schedule performance
> h3. Target
> Benchmark how long from {{JobMaster}} startup(receive the {{JobGraph}} and init) to all tasks RUNNING. Technically we use bounded stream and TM finishes tasks as soon as they arrived. So the real interval we measure is to all tasks FINISHED.
> h3. Case
> 1. JobGraph that cover EAGER + PIPELINED edges
> 2. JobGraph that cover LAZY_FROM_SOURCES + PIPELINED edges
> 3. JobGraph that cover LAZY_FROM_SOURCES + BLOCKING edges
> ps: maybe benchmark if the source is get from {{InputSplit}}?
> h3. Implement
> Based on the flink-benchmark repo, we finally run benchmark using jmh. So the whole test suit is separated into two repos. The testing environment could be located in the main repo, maybe under flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/benchmark.
> To measure the performance of {{JobMaster}} scheduling, we need to simulate an environment that:
> 1. has a real {{JobMaster}}
> 2. has a mock/testing {{ResourceManager}} that having infinite resource and react immediately.
> 3. has a(many?) mock/testing {{TaskExecutor}} that deploy and finish tasks immediately.
> [~trohrmann@apache.org] [~GJL] [~pnowojski] could you please review this proposal to help clarify the goal and concrete details? Thanks in advance.
> Any suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)