You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2018/11/02 10:57:37 UTC
Re: Follow up ideas, to simplify creating MonitoringInfos.

> I was unable to get this to compile and could find no examples of this on the proto github.

Would it be helpful to post the compiler output?

-Max

On 31.10.18 19:19, Lukasz Cwik wrote:
> I see and don't know how to help you beyond what your already 
> suggesting. From what I remember, maps were added as syntactic sugar of 
> lists of key value pairs.
> 
> On Tue, Oct 30, 2018 at 5:37 PM Alex Amato <ajamato@google.com 
> <ma...@google.com>> wrote:
> 
>     I am not sure on the correct syntax to populate the instances of my
>     MonitoringInfoSpec messages
> 
>     message MonitoringInfoSpec {
> 
>     string urn = 1;
> 
>     string type_urn = 2;
> 
>     repeated string required_labels = 3;
> 
>     *map<string, string> annotations = 4;*
> 
>     }
> 
> 
>     Notice how the annotations field is not used anywhere. I was unable
>     to get this to compile and could find no examples of this on the
>     proto github. Perhaps I'll have to reach out to them. I was
>     wondering if anyone here was familiar first.
> 
> 
>     message MonitoringInfoSpecs {
> 
>     enum MonitoringInfoSpecsEnum {
> 
>        USER_COUNTER = 0 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:user",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>        }];
> 
> 
>        ELEMENT_COUNT = 1 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:element_count:v1",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>          required_labels: ["PTRANSFORM"],
> 
>        }];
> 
> 
>        START_BUNDLE_MSECS = 2 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:pardo_execution_time:start_bundle_msecs:v1",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>          required_labels: ["PTRANSFORM"],
> 
>        }];
> 
> 
>        PROCESS_BUNDLE_MSECS = 3 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:pardo_execution_time:process_bundle_msecs:v1",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>          required_labels: ["PTRANSFORM"],
> 
>        }];
> 
> 
>        FINISH_BUNDLE_MSECS = 4 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:pardo_execution_time:finish_bundle_msecs:v1",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>          required_labels: ["PTRANSFORM"],
> 
>        }];
> 
> 
>        TOTAL_MSECS = 5 [(monitoring_info_spec) = {
> 
>          urn: "beam:metric:ptransform_execution_time:total_msecs:v1",
> 
>          type_urn: "beam:metrics:sum_int_64",
> 
>          required_labels: ["PTRANSFORM"],
> 
>        }];
> 
>     }
> 
>     }
> 
> 
> 
> 
>     On Tue, Oct 30, 2018 at 2:01 PM Lukasz Cwik <lcwik@google.com
>     <ma...@google.com>> wrote:
> 
>         I'm not sure what you mean by "Using a map in an option."
> 
>         For your second issue, the google docs around this show[1]:
> 
>         Note that if you want to use a custom option in a package other
>         than the one in which it was defined, you must prefix the option
>         name with the package name, just as you would for type names.
>         For example:
> 
>         // foo.proto
>         import "google/protobuf/descriptor.proto";
>         package foo;
>         extend google.protobuf.MessageOptions {
>            optional string my_option = 51234;
>         }
> 
>         // bar.proto
>         import "foo.proto";
>         package bar;
>         message MyMessage {
>            option (foo.my_option) = "Hello world!";
>         }
> 
> 
>         1:
>         https://developers.google.com/protocol-buffers/docs/proto#customoptions
> 
> 
>         On Mon, Oct 29, 2018 at 5:19 PM Alex Amato <ajamato@google.com
>         <ma...@google.com>> wrote:
> 
>             Hi Robert and community, :)
> 
>             I was starting to code this up, but I wasn't sure exactly
>             how to do some of the proto syntax. Would you mind taking a
>             look at this doc
>             <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>
>             and let me know if you know how to resolve any of these issues:
> 
>               * Using a map in an option.
>               * Referring to string "constants" from other enum options
>                 in .proto files.
> 
>             Please see the comments I have listed in the doc
>             <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>,
>             and a few alternative suggestions.
>             Thanks
> 
>             On Wed, Oct 24, 2018 at 10:08 AM Alex Amato
>             <ajamato@google.com <ma...@google.com>> wrote:
> 
>                 Okay. That makes sense. Using runtime validation and
>                 protos is what I was thinking as well.
>                 I'll include you as a reviewer in my PRs.
> 
>                 As for the choice of a builder/constructor/factory. If
>                 we go with factory methods/constructor then we will need
>                 a method for each metric type (intCounter, latestInt64,
>                 ...). Plus, then I don't think we can have any
>                 constructor parameters for labels, we will just need to
>                 accept a dictionary for label keys+values like you say.
>                 This is because we cannot offer a method for each URN
>                 and its combination of labels, and we should avoid such
>                 static detection, as you say.
> 
>                 The builder however, allows us to add a method for
>                 setting each label. Since there are a small number of
>                 labels I think this should be fine. A specific metric
>                 URN will have a specific set of labels which we can
>                 validate being set. Any variant of this should use a
>                 different label (or a new version in the label). Thus,
>                 the builder can give an advantage, making it easier to
>                 set label values without needing to provide the correct
>                 key for the label, when its set. You just need to call
>                 the correct method.
> 
>                 Perhaps it might be best to leave this open to each SDK
>                 based on its language style (Builder, Factory, etc.) ,
>                 but make sure we use the protos+runtime validation.
> 
>                 On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw
>                 <robertwb@google.com <ma...@google.com>> wrote:
> 
>                     Thanks for bringing this to the list; it's a good
>                     question.
> 
>                     I think the difficulty comes from trying to
>                     statically define a lists
>                     of possibilities that should instead be runtime
>                     values. E.g. we
>                     currently we're up to about a dozen distinct types,
>                     and having a
>                     setter for each is both verbose and not very
>                     extensible (especially
>                     replicated cross language). I'm not sure the set of
>                     possible labels is
>                     fixed either. Generally in the FnAPI we've been
>                     using shared constants
>                     for this kind of thing instead. (I was wary about
>                     the protos for the
>                     same reasons; it would be good to avoid leaking this
>                     through to each
>                     of the various SDKs.) The amount of static
>                     typing/validation one gets
>                     depends on how much logic you build into each of
>                     these methods (e.g.
>                     does it (almost?) always "metric" which is assumed
>                     to already be
>                     encoded correctly, or a specific type that has
>                     tradeoffs with the
>                     amount you can do generically (e.g. we have code
>                     that first loops over
>                     counters, then over distributions, then over gauges,
>                     and I don't think
>                     we want continue that pattern all M places for all N
>                     types)).
> 
>                     I would probably lean towards mostly doing runtime
>                     validation here.
>                     Specifically, one would have a data file that
>                     defines, for each known
>                     URN, its type along with the set of
>                     permitted/expected/required
>                     labels. On construction a MonitoringInfo data point,
>                     one would provide
>                     a URN + value + labelMap, and optionally a type. On
>                     construction, the
>                     constructor (method, factory, whatever) would look
>                     up the URN to
>                     determine the type (or throw an error if it's both
>                     not known and not
>                     provided), which could be then used to fetch an
>                     encoder of sorts (that
>                     can go from value <-> proto encoding, possibly with
>                     some validation).
>                     If labels and/or annotations are provided and the
>                     URN is known, we can
>                     validate these as well.
> 
>                     As for proto/enums vs. yaml, the former is nice
>                     because code
>                     generation comes for free, but has turned out to be
>                     much more verbose
>                     (both the definition and the use) and I'm still on
>                     the fence whether
>                     it's a net win.
> 
>                     (Unfortunately AutoValue won't help much here, as
>                     the ultimate goal is
>                     to wrap a very nested proto OneOf, ideally with some
>                     validation.
>                     (Also, in Python, generally, having optional, named
>                     arguments makes
>                     this kind of builder pattern less needed.))
> 
>                     On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles
>                     <kenn@apache.org <ma...@apache.org>> wrote:
>                      >
>                      > FWIW AutoValue will build most of that class for
>                     you, if it is as you say.
>                      >
>                      > Kenn
>                      >
>                      > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato
>                     <ajamato@google.com <ma...@google.com>> wrote:
>                      >>
>                      >> Hi Robert + beam dev list,
>                      >>
>                      >> I was thinking about your feedback in PR#6205,
>                     and agree that this monitoring_infos.py became a bit
>                     big.
>                      >>
>                      >> I'm working on the Java Implementation of this
>                     now, and want to incorporate some of these ideas and
>                     improve on this.
>                      >>
>                      >> I that that I should make something like a
>                     MonitoringInfoBuilder class. With a few methods
>                      >>
>                      >> setUrn
>                      >> setTimestamp
>                      >> setting the value (One method for each Type we
>                     support. Setting this will also set the type string)
>                      >>
>                      >> setInt64CounterValue
>                      >> setDoubleCounterValue
>                      >> setLatestInt64
>                      >> setTopNInt64
>                      >> setMonitoringDataTable
>                      >> setDistributionInt64
>                      >> ...
>                      >>
>                      >> setting labels (will set the key and value
>                     properly for the label)
>                      >>
>                      >> setPTransform(value)
>                      >> setPcollection(value)
>                      >> ...
>                      >>
>                      >>
>                      >> I think this will make building a metric much
>                     easier, you would just call the 4 methods and the
>                     .build(). These builders are common in Java. (I
>                     guess there is a similar thing way we could do in
>                     python? I'd like to go back and refactor that as
>                     well when I am done)
>                      >>
>                      >> -------------
>                      >>
>                      >> As for your other suggestion to define metrics
>                     in the proto/enum file instead of the yaml file. I
>                     am not too sure about the best strategy for this. My
>                     initial thoughts are:
>                      >>
>                      >> Make a proto extension allowing you to
>                     describe/define a MonitoringInfo's (the same info as
>                     the metric_definitions.yaml file):
>                      >>
>                      >> URN
>                      >> Type
>                      >> Labels required
>                      >> Annotations: Description, Units, etc.
>                      >>
>                      >> Make the builder read in that MonitoringInfo
>                     definision/description assert everything is set
>                     properly? I think this would be a decent data driven
>                     approach.
>                      >>
>                      >> I was wondering if you had something else in mind?
>                      >>
>                      >> Thanks
>                      >> Alex
>                      >>
>                      >>
>