You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2019/02/28 12:48:00 UTC
[jira] [Updated] (FLINK-9597) Flink application does not scale as expected

     [ https://issues.apache.org/jira/browse/FLINK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Metzger updated FLINK-9597:
----------------------------------
    Component/s:     (was: Core)

> Flink application does not scale as expected
> --------------------------------------------
>
>                 Key: FLINK-9597
>                 URL: https://issues.apache.org/jira/browse/FLINK-9597
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: swy
>            Priority: Major
>         Attachments: JM.png, TM.png, flink_app_parser_git.zip, sample.png, scaleNotWork.png
>
>
> Hi, we found that our Flink application with simple logic, which using process function is not scale-able when scale from 8 parallelism onward even though with sufficient resources. Below it the result which is capped at ~250k TPS. No matter how we tune the parallelism of the operators it just not scale, same to increase source parallelism.
> Please refer to "scaleNotWork.png",
> 1. fixed source parallelism 4, other operators parallelism 8
> 2. fixed source parallelism 4, other operators parallelism 16
> 3. fixed source parallelism 4, other operators parallelism 32
> 4. fixed source parallelism 6, other operators parallelism 8
> 5. fixed source parallelism 6, other operators parallelism 16
> 6. fixed source parallelism 6, other operators parallelism 32
> 7. fixed source parallelism 6, other operators parallelism 64 performance worse than parallelism 32.
> Sample source code attached(flink_app_parser_git.zip). It is a simple program, parsing json record into object, and pass it to a empty logic Flink's process function. Rocksdb is in used, and the source is generated by the program itself. This could be reproduce easily. 
> We choose Flink because of it scalability, but this is not the case now, appreciated if anyone could help as this is impacting our projects! thank you.
> To run the program, sample parameters,
> "aggrinterval=6000000 loop=7500000 statsd=1 psrc=4 pJ2R=32 pAggr=72 URL=do36.mycompany.com:8127"
> * aggrinterval: time in ms for timer to trigger
> * loop: how many row of data to feed
> * statsd: to send result to statsd
> * psrc: source parallelism
> * pJ2R: parallelism of map operator(JsonRecTranslator)
> * pAggr: parallelism of process+timer operator(AggregationDuration) 
> We are running in VMWare, 5 Task Managers and each has 32 slots.
> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                32
> On-line CPU(s) list:   0-31
> Thread(s) per core:    1
> Core(s) per socket:    1
> Socket(s):             32
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Model name:            Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
> Stepping:              2
> CPU MHz:               2593.993
> BogoMIPS:              5187.98
> Hypervisor vendor:     VMware
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              20480K
> NUMA node0 CPU(s):     0-31
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm epb fsgsbase smep dtherm ida arat pln pts
>               total        used        free      shared  buff/cache   available
> Mem:             98          24          72           0           1          72
> Swap:             3           0           3
> Please refer TM.png and JM.png for further details.
> The test without any checkpoint enable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)