You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "dan norwood (JIRA)" <ji...@apache.org> on 2019/05/23 15:19:00 UTC
[jira] [Created] (KAFKA-8414)
org.apache.kafka.common.metrics.MetricsTest.testConcurrentReadUpdateReport
hang
dan norwood created KAFKA-8414:
----------------------------------
Summary: org.apache.kafka.common.metrics.MetricsTest.testConcurrentReadUpdateReport hang
Key: KAFKA-8414
URL: https://issues.apache.org/jira/browse/KAFKA-8414
Project: Kafka
Issue Type: Bug
Reporter: dan norwood
caveat: this only happens on AMD Epyc machines with >=48 cpus. i have below a bunch of machine info from various `*a.*` aws instance sizes i ran against.
i noticed what seems like a deadlock when running `org.apache.kafka.common.metrics.MetricsTest` on an aws instance with 96vCPUs (specifically a m5a.24xlarge). after some debugging it seems like the offending issue is [https://github.com/apache/kafka/blob/trunk/clients/src/test/java/org/apache/kafka/common/metrics/MetricsTest.java#L776-L778]
{code:java}
public void run() {
try {
while (alive.get()) {
op.run();
}
} catch (Throwable t) {
log.error("Metric {} failed with exception", opName, t);
}
}
{code}
since the `op.run()` methods are all synchronized we end up nonstop hammering it. after adding some logging i saw steadily increasing wait times for entry in to each synchronized block. so this is not *really* a deadlock or hang, but a progressive slowdown that makes the test unrunnable.
the offending op seems to be [https://github.com/apache/kafka/blob/trunk/clients/src/test/java/org/apache/kafka/common/metrics/MetricsTest.java#L747]
{code:java}
Future<?> reportFuture = executorService.submit(new ConcurrentMetricOperation(alive, "report", () -> reporter.processMetrics()));
{code}
possible fix:
adding a `Thread.sleep(0, 1)` inside the runloop for `ConcurrentMetricOperation` seems to allow the test to pass. but i'm not sure that it wouldn't mask an issue that the test is meant to detect
Good:
t3a.large
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7571
Stepping: 2
CPU MHz: 2200.116
BogoMIPS: 4400.23
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext cpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
```
t3a.2xlarge
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7571
Stepping: 2
CPU MHz: 2199.916
BogoMIPS: 4399.83
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext cpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
```
m5a.4xlarge
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7571
Stepping: 2
CPU MHz: 2585.550
BogoMIPS: 4399.98
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core cpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
```
bad:
m5a.4xlarge
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 3
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7571
Stepping: 2
CPU MHz: 2315.397
BogoMIPS: 4400.13
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7,24-31
NUMA node1 CPU(s): 8-15,32-39
NUMA node2 CPU(s): 16-23,40-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core cpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
```
m5a.24xlarge
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 6
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7571
Stepping: 2
CPU MHz: 2421.512
BogoMIPS: 4399.19
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7,48-55
NUMA node1 CPU(s): 8-15,56-63
NUMA node2 CPU(s): 16-23,64-71
NUMA node3 CPU(s): 24-31,72-79
NUMA node4 CPU(s): 32-39,80-87
NUMA node5 CPU(s): 40-47,88-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core cpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)