You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/16 15:39:00 UTC

[jira] [Work logged] (BEAM-3310) Push metrics to a backend in an runner agnostic way

     [ https://issues.apache.org/jira/browse/BEAM-3310?focusedWorklogId=81203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81203 ]

ASF GitHub Bot logged work on BEAM-3310:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Mar/18 15:38
            Start Date: 16/Mar/18 15:38
    Worklog Time Spent: 10m 
      Work Description: echauchot commented on issue #4548: [BEAM-3310] Metrics pusher
URL: https://github.com/apache/beam/pull/4548#issuecomment-373752498
 
 
   
   
   
   
   Review status: 0 of 26 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.
   
   ---
   
   *[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java, line 79 at r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mWhVmcivKjpDBM54k:-L6X-cVDGhjq1P1ALGWL:bkzx7tm) ([raw file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java#L79)):*
   <details><summary><i>Previously, swegner (Scott Wegner) wrote…</i></summary><blockquote>
   
   Good question. This is similar to the situation with DirectRunner, and I had to look it up to see how it works: https://github.com/apache/beam/blob/081de891e05a6c84db0f3fd03e3ad95f362d5c4c/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java#L283
   
   It seems that the `DefaultValueFactory` implementation uses reflection to instantiate DirectRunner at runtime without a compile-time reference. We could do something similar here, with the requirement that that MetricsHttpSink is either on the classpath (via runners-core), or a different sink is configured.
   </blockquote></details>
   
   ok I'll use reflexion and put MetricsSink in sdk as it is done for PipelineRunner. The implementations of MetricsSink can stay in runner-core
   
   ---
   
   *[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java, line 82 at r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mXJjriEP9_vXEL0Ef:-L6aX3NXGH-H1_daXY7-:b-wy47yr) ([raw file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java#L82)):*
   <details><summary><i>Previously, swegner (Scott Wegner) wrote…</i></summary><blockquote>
   
   That's a good idea. Note that if you just need an implementation for test asserts you can use a mock implementation as well.
   </blockquote></details>
   
   OK I have split DummyMetricsSink into NoOpMetricsSink (default sink) and TestMetricsSink (used for tests) and I no more start the polling thread and no more push when NoOpSink is provided.
   
   ---
   
   *[sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml, line 42 at r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mb9MCqXj2PrgZGqt6:-L6VjD2dvP4FOsWIwwun:bub3ovg) ([raw file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L42)):*
   <details><summary><i>Previously, swegner (Scott Wegner) wrote…</i></summary><blockquote>
   
   I tried removing this and running locally and I don't see the FindBugs warning.  Perhaps this is no longer relevant?
   
   If you are still seeing an error, can you paste it here? For reference, I am building via: `mvn clean install -T1C -DskipTests -Dcheckstyle.skip -am  -pl runners/core-java/`
   </blockquote></details>
   
   If I remove the exception in findbugs and launch 
   mvn clean install -Prelease in runners/core-java, I still get a 
    Redundant null check at MetricsPusher.java:[line 61] RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE
   
   ---
   
   *[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/JsonMetricsSerializer.java, line 29 at r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4lluU2UEFYijwXu-hj:-L7iF-6UtWhYh16oCUMz:b-rqyxqb) ([raw file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/JsonMetricsSerializer.java#L29)):*
   <details><summary><i>Previously, swegner (Scott Wegner) wrote…</i></summary><blockquote>
   
   I disagree that manual json serialization is the right way to go. This code is brittle and cumbersome to maintain.
   
   Brittle: Does this serialization code correctly handle edge cases, such as null values, encoding special characters and very large numeric values? It's unclear, and writing a sufficient set of tests to validate each field would be tedious.
   
   Cumbersome: Making changes to this or related code will be difficult. What if the MetricQueryResults schema changes, or we want to add optional pretty-printing for debugging, or we want to make some fields optional for serialization? Starting with a real serialization library would make these changes easy. Rolling our own makes the code hard to maintain.
   
   If MetricQueryResults is poorly suited for serialization libraries, that means that consumers of the data will need manual deserialization as well. We should ensure that the data model is easy to consume, making modifications to MetricQueryResults if necessary. One idea is to define a POJO implementation of the MetricsQueryResults interface to use for serialization/deserialization. Another would be to use protobuf to define a language-agnostic schema with auto-generated language bindings and serialization format. This is what the FnAPI in the Beam portability framework is using: https://github.com/google/protobuf
   
   Mentioning a few people who have previously chimed in on the dev list: @lukecwik @robertwb @rmannibucau
   </blockquote></details>
   
   Fair enough, I agree.
   So I moved the MetricsHttpSink POC to a new runner-extensions-java module depending on jackson and I used jackson for serialization.
   
   Regarding your question about MetricQueryResults being poorly suited for serialization libraries: actually MetricQueryResults is almost suited for out of the box serialization with serialization libraries. Classes are already POJOs, the only thing that it lacks is getters that are called get* to be compatible with default object mappers. So as you suggested I preferred modifying  MetricQueryResults and related classes getter names than adding new POJO classes that do exactly the same but with get* names. I isolated this change in a commit so that it can be cherry picked to another PR if needed.
   
   ---
   
   *[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsSink.java, line 28 at r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mZncWXgIUYuM5Qf8j:-L6WFJd5xKRkRRaoHwZB:bnel6nh) ([raw file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsSink.java#L28)):*
   <details><summary><i>Previously, swegner (Scott Wegner) wrote…</i></summary><blockquote>
   
   Is there an actual use-case for HttpSink which requires different serializers? If HttpSink is just a proof-of-concept implementation then I don't think there's a need to focus too much on customizability.
   
   I could imagine the functionality in HttpSink being useful as a base for other sinks like Graphite. In that case, we should ensure the common functionality is easy to pull out and reuse within a Graphite sync. For that, having the concept of a MetricsSerializer is useful, but I don't know if it necessarily needs this base class.
   </blockquote></details>
   
   Indeed HttpSink is more a POC. If we would like to use it as a base class for other sinks like graphite, the reusable features would be:
   - serialize a metric: this can be done in the metric pojo you were talking about
   - issue an http request: this is not much to reuse especially if graphite (for ex) requests need extra parameters or headers (like authentication ...)
   => So I agree it might be over designed. I'll convert MetricsSink to an interface with a single writeMetrics method.
   
   ---
   
   
   *Comments from [Reviewable](https://reviewable.io:443/reviews/apache/beam/4548)*
   <!-- Sent from Reviewable.io -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 81203)
    Time Spent: 1.5h  (was: 1h 20m)

> Push metrics to a backend in an runner agnostic way
> ---------------------------------------------------
>
>                 Key: BEAM-3310
>                 URL: https://issues.apache.org/jira/browse/BEAM-3310
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics (either at the end of the pipeline or while it runs) because they don't have all the same capabilities towards metrics (e.g. spark runner configures sinks  like csv, graphite or in memory sinks using the spark engine conf). The target is to push the metrics in the common runner code so that no matter the chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)