You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/22 01:45:00 UTC

[jira] [Work logged] (BEAM-9167) Reduce overhead of Go SDK side metrics

     [ https://issues.apache.org/jira/browse/BEAM-9167?focusedWorklogId=375358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375358 ]

ASF GitHub Bot logged work on BEAM-9167:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jan/20 01:44
            Start Date: 22/Jan/20 01:44
    Worklog Time Spent: 10m 
      Work Description: lostluck commented on pull request #10654: [BEAM-9167] Reduce Go SDK metric overhead
URL: https://github.com/apache/beam/pull/10654
 
 
   This PR dramatically reduces the overhead of metrics in the Go SDK.
   
   A contemporary side by side comparison of the benchmark in the metrics package on my current machine:
   benchmark                                        old ns/op     new ns/op     delta
   BenchmarkMetrics/counter_inplace-12              585           249           -57.44%
   BenchmarkMetrics/distribution_inplace-12         622           270           -56.59%
   BenchmarkMetrics/gauge_inplace-12                812           311           -61.70%
   BenchmarkMetrics/counter_predeclared-12          227           15.8          -93.04%
   BenchmarkMetrics/distribution_predeclared-12     282           24.0          -91.49%
   BenchmarkMetrics/gauge_predeclared-12            389           63.7          -83.62%
   
   benchmark                                        old allocs     new allocs     delta
   BenchmarkMetrics/counter_inplace-12              4              1              -75.00%
   BenchmarkMetrics/distribution_inplace-12         4              1              -75.00%
   BenchmarkMetrics/gauge_inplace-12                4              1              -75.00%
   BenchmarkMetrics/counter_predeclared-12          3              0              -100.00%
   BenchmarkMetrics/distribution_predeclared-12     3              0              -100.00%
   BenchmarkMetrics/gauge_predeclared-12            3              0              -100.00%
   
   benchmark                                        old bytes     new bytes     delta
   BenchmarkMetrics/counter_inplace-12              160           48            -70.00%
   BenchmarkMetrics/distribution_inplace-12         192           48            -75.00%
   BenchmarkMetrics/gauge_inplace-12                192           48            -75.00%
   BenchmarkMetrics/counter_predeclared-12          48            0             -100.00%
   BenchmarkMetrics/distribution_predeclared-12     80            0             -100.00%
   BenchmarkMetrics/gauge_predeclared-12            80            0             -100.00%
   
   In particular this PR moves away from a global datastore for all metrics towards a perBundle based countersets. This allows for the removal of the per layer locks and the global lock that needed to be checked since all bundles had to check the same datastore. Now they only store a metric cell in the global store on first creation (still stored per bundle and per ptransform).
   
   A subsequent change will remove the global store altogether in favour of better exposing the metrics per bundle, and allowing a callback visitor to thread-safely access the data inside each metric. This will also permit removing the dependency on the protos from the package, which was a mistake I made when I first wrote the package.
   
   Further, Counters now use atomic operations rather than locks, which additional speeds them up vs the previous mutex approach.
   
   Counter "names" are hashed ahead of time and the hash value cached in the proxy to increase the speed of subsequent lookups using the same proxy object.
   
   This does make the proxies unsafe to use concurrently within the same bundle prior to first use, but this matches the general rule of Beam runners managing the concurrency for efficient processing, and that framework constructs are not safe for concurrent use by user code, without user managed locks.
   
   As an exploration, I did try using sync.Map to avoid the above restriction, but the overhead for the additional interface wraping and unwraping was significant enough that this approach was worthwhile. 
   This may be worth revisiting if Go gains Generics, as that would probably avoid this cost.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 375358)
    Remaining Estimate: 0h
            Time Spent: 10m

> Reduce overhead of Go SDK side metrics
> --------------------------------------
>
>                 Key: BEAM-9167
>                 URL: https://issues.apache.org/jira/browse/BEAM-9167
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Locking overhead due to the global store and local caches of SDK counter data can dominate certain workloads, which means we can do better.
> Instead of having a global store of metrics data to extract counters, we should use per ptransform (or per bundle) counter sets, which would avoid requiring locking per counter operation. The main detriment compared to the current implementation is that a user would need to add their own locking if they were to spawn multiple goroutines to process a Bundle's work in a DoFn.
> Given that self multithreaded DoFns aren't recommended/safe in Java,  largely impossible in Python, and the other beam Go SDK provided constructs (like Iterators and Emitters) are not thread safe, this is a small concern, provided the documentation is clear on this.
> Removing the locking and switching to atomic ops reduces the overhead significantly in example jobs and in the benchmarks.
> A second part of this change should be to move the exec package to manage it's own per bundle state, rather than relying on a global datastore to extract the per bundle,per ptransform values.
> Related: https://issues.apache.org/jira/browse/BEAM-6541 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)