You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/01 10:37:00 UTC

[jira] [Work logged] (BEAM-11657) Kafka read performance regression due to added header support

     [ https://issues.apache.org/jira/browse/BEAM-11657?focusedWorklogId=545248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545248 ]

ASF GitHub Bot logged work on BEAM-11657:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Feb/21 10:36
            Start Date: 01/Feb/21 10:36
    Worklog Time Spent: 10m 
      Work Description: scwhittle commented on pull request #13782:
URL: https://github.com/apache/beam/pull/13782#issuecomment-770755221


   Not sure if there are dashboards but the perf results from console output were:
   16:21:21 org.apache.beam.sdk.io.kafka.KafkaIOIT > testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
   16:21:21     Load test results for test (ID): 539adecb-21ea-4aaf-935c-95c9bb9b91c5 and timestamp: 2021-01-28T15:02:34.094000000Z:
   16:21:21                      Metric:                    Value:
   16:21:21                    read_time                     1.385
   16:21:21                   write_time                     9.316
   16:21:21                     run_time                    10.701
   
   The subsequent run https://ci-beam.apache.org/job/beam_PerformanceTests_Kafka_IO/1871/console had results:
   20:34:52 org.apache.beam.sdk.io.kafka.KafkaIOIT > testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
   20:34:52     Load test results for test (ID): cb47f5d5-0102-49ce-8fd1-73eb3e6bbe40 and timestamp: 2021-01-28T19:16:14.211000000Z:
   20:34:52                      Metric:                    Value:
   20:34:52                    read_time                     2.873
   20:34:52                   write_time                     14.97
   20:34:52                     run_time                    17.843
   
   I'm not sure how stable these are, write_time shouldn't directly be effected, but if the pipeline is doing both in parallel the CPU waste on reading could impact write performance as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 545248)
    Time Spent: 1h 40m  (was: 1.5h)

> Kafka read performance regression due to added header support
> -------------------------------------------------------------
>
>                 Key: BEAM-11657
>                 URL: https://issues.apache.org/jira/browse/BEAM-11657
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kafka
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Support for headers in KafkaIO reads was recently added:
> https://issues.apache.org/jira/browse/BEAM-10865
> This introduced several reflection calls into the path of advancing KafkaUnboundedReader.  While separately running benchmarks, I noticed this regression.  
> Calls currently come from:
> ConsumerSpEL.hasHeaders -> can be cached similar to other booleans
> deserialize key and value methods -> could be avoided in cases where headers are not being examined (at a minimum can be avoided for known coders like ByteArrayDeserializer)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)