You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/01 10:37:00 UTC
[jira] [Work logged] (BEAM-11657) Kafka read performance regression
due to added header support
[ https://issues.apache.org/jira/browse/BEAM-11657?focusedWorklogId=545248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545248 ]
ASF GitHub Bot logged work on BEAM-11657:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Feb/21 10:36
Start Date: 01/Feb/21 10:36
Worklog Time Spent: 10m
Work Description: scwhittle commented on pull request #13782:
URL: https://github.com/apache/beam/pull/13782#issuecomment-770755221
Not sure if there are dashboards but the perf results from console output were:
16:21:21 org.apache.beam.sdk.io.kafka.KafkaIOIT > testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
16:21:21 Load test results for test (ID): 539adecb-21ea-4aaf-935c-95c9bb9b91c5 and timestamp: 2021-01-28T15:02:34.094000000Z:
16:21:21 Metric: Value:
16:21:21 read_time 1.385
16:21:21 write_time 9.316
16:21:21 run_time 10.701
The subsequent run https://ci-beam.apache.org/job/beam_PerformanceTests_Kafka_IO/1871/console had results:
20:34:52 org.apache.beam.sdk.io.kafka.KafkaIOIT > testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
20:34:52 Load test results for test (ID): cb47f5d5-0102-49ce-8fd1-73eb3e6bbe40 and timestamp: 2021-01-28T19:16:14.211000000Z:
20:34:52 Metric: Value:
20:34:52 read_time 2.873
20:34:52 write_time 14.97
20:34:52 run_time 17.843
I'm not sure how stable these are, write_time shouldn't directly be effected, but if the pipeline is doing both in parallel the CPU waste on reading could impact write performance as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Issue Time Tracking
-------------------
Worklog Id: (was: 545248)
Time Spent: 1h 40m (was: 1.5h)
> Kafka read performance regression due to added header support
> -------------------------------------------------------------
>
> Key: BEAM-11657
> URL: https://issues.apache.org/jira/browse/BEAM-11657
> Project: Beam
> Issue Type: Bug
> Components: io-java-kafka
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P2
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Support for headers in KafkaIO reads was recently added:
> https://issues.apache.org/jira/browse/BEAM-10865
> This introduced several reflection calls into the path of advancing KafkaUnboundedReader. While separately running benchmarks, I noticed this regression.
> Calls currently come from:
> ConsumerSpEL.hasHeaders -> can be cached similar to other booleans
> deserialize key and value methods -> could be avoided in cases where headers are not being examined (at a minimum can be avoided for known coders like ByteArrayDeserializer)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)