You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2018/11/02 10:57:37 UTC
Re: Follow up ideas, to simplify creating MonitoringInfos.
> I was unable to get this to compile and could find no examples of this on the proto github.
Would it be helpful to post the compiler output?
-Max
On 31.10.18 19:19, Lukasz Cwik wrote:
> I see and don't know how to help you beyond what your already
> suggesting. From what I remember, maps were added as syntactic sugar of
> lists of key value pairs.
>
> On Tue, Oct 30, 2018 at 5:37 PM Alex Amato <ajamato@google.com
> <ma...@google.com>> wrote:
>
> I am not sure on the correct syntax to populate the instances of my
> MonitoringInfoSpec messages
>
> message MonitoringInfoSpec {
>
> string urn = 1;
>
> string type_urn = 2;
>
> repeated string required_labels = 3;
>
> *map<string, string> annotations = 4;*
>
> }
>
>
> Notice how the annotations field is not used anywhere. I was unable
> to get this to compile and could find no examples of this on the
> proto github. Perhaps I'll have to reach out to them. I was
> wondering if anyone here was familiar first.
>
>
> message MonitoringInfoSpecs {
>
> enum MonitoringInfoSpecsEnum {
>
> USER_COUNTER = 0 [(monitoring_info_spec) = {
>
> urn: "beam:metric:user",
>
> type_urn: "beam:metrics:sum_int_64",
>
> }];
>
>
> ELEMENT_COUNT = 1 [(monitoring_info_spec) = {
>
> urn: "beam:metric:element_count:v1",
>
> type_urn: "beam:metrics:sum_int_64",
>
> required_labels: ["PTRANSFORM"],
>
> }];
>
>
> START_BUNDLE_MSECS = 2 [(monitoring_info_spec) = {
>
> urn: "beam:metric:pardo_execution_time:start_bundle_msecs:v1",
>
> type_urn: "beam:metrics:sum_int_64",
>
> required_labels: ["PTRANSFORM"],
>
> }];
>
>
> PROCESS_BUNDLE_MSECS = 3 [(monitoring_info_spec) = {
>
> urn: "beam:metric:pardo_execution_time:process_bundle_msecs:v1",
>
> type_urn: "beam:metrics:sum_int_64",
>
> required_labels: ["PTRANSFORM"],
>
> }];
>
>
> FINISH_BUNDLE_MSECS = 4 [(monitoring_info_spec) = {
>
> urn: "beam:metric:pardo_execution_time:finish_bundle_msecs:v1",
>
> type_urn: "beam:metrics:sum_int_64",
>
> required_labels: ["PTRANSFORM"],
>
> }];
>
>
> TOTAL_MSECS = 5 [(monitoring_info_spec) = {
>
> urn: "beam:metric:ptransform_execution_time:total_msecs:v1",
>
> type_urn: "beam:metrics:sum_int_64",
>
> required_labels: ["PTRANSFORM"],
>
> }];
>
> }
>
> }
>
>
>
>
> On Tue, Oct 30, 2018 at 2:01 PM Lukasz Cwik <lcwik@google.com
> <ma...@google.com>> wrote:
>
> I'm not sure what you mean by "Using a map in an option."
>
> For your second issue, the google docs around this show[1]:
>
> Note that if you want to use a custom option in a package other
> than the one in which it was defined, you must prefix the option
> name with the package name, just as you would for type names.
> For example:
>
> // foo.proto
> import "google/protobuf/descriptor.proto";
> package foo;
> extend google.protobuf.MessageOptions {
> optional string my_option = 51234;
> }
>
> // bar.proto
> import "foo.proto";
> package bar;
> message MyMessage {
> option (foo.my_option) = "Hello world!";
> }
>
>
> 1:
> https://developers.google.com/protocol-buffers/docs/proto#customoptions
>
>
> On Mon, Oct 29, 2018 at 5:19 PM Alex Amato <ajamato@google.com
> <ma...@google.com>> wrote:
>
> Hi Robert and community, :)
>
> I was starting to code this up, but I wasn't sure exactly
> how to do some of the proto syntax. Would you mind taking a
> look at this doc
> <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>
> and let me know if you know how to resolve any of these issues:
>
> * Using a map in an option.
> * Referring to string "constants" from other enum options
> in .proto files.
>
> Please see the comments I have listed in the doc
> <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>,
> and a few alternative suggestions.
> Thanks
>
> On Wed, Oct 24, 2018 at 10:08 AM Alex Amato
> <ajamato@google.com <ma...@google.com>> wrote:
>
> Okay. That makes sense. Using runtime validation and
> protos is what I was thinking as well.
> I'll include you as a reviewer in my PRs.
>
> As for the choice of a builder/constructor/factory. If
> we go with factory methods/constructor then we will need
> a method for each metric type (intCounter, latestInt64,
> ...). Plus, then I don't think we can have any
> constructor parameters for labels, we will just need to
> accept a dictionary for label keys+values like you say.
> This is because we cannot offer a method for each URN
> and its combination of labels, and we should avoid such
> static detection, as you say.
>
> The builder however, allows us to add a method for
> setting each label. Since there are a small number of
> labels I think this should be fine. A specific metric
> URN will have a specific set of labels which we can
> validate being set. Any variant of this should use a
> different label (or a new version in the label). Thus,
> the builder can give an advantage, making it easier to
> set label values without needing to provide the correct
> key for the label, when its set. You just need to call
> the correct method.
>
> Perhaps it might be best to leave this open to each SDK
> based on its language style (Builder, Factory, etc.) ,
> but make sure we use the protos+runtime validation.
>
> On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw
> <robertwb@google.com <ma...@google.com>> wrote:
>
> Thanks for bringing this to the list; it's a good
> question.
>
> I think the difficulty comes from trying to
> statically define a lists
> of possibilities that should instead be runtime
> values. E.g. we
> currently we're up to about a dozen distinct types,
> and having a
> setter for each is both verbose and not very
> extensible (especially
> replicated cross language). I'm not sure the set of
> possible labels is
> fixed either. Generally in the FnAPI we've been
> using shared constants
> for this kind of thing instead. (I was wary about
> the protos for the
> same reasons; it would be good to avoid leaking this
> through to each
> of the various SDKs.) The amount of static
> typing/validation one gets
> depends on how much logic you build into each of
> these methods (e.g.
> does it (almost?) always "metric" which is assumed
> to already be
> encoded correctly, or a specific type that has
> tradeoffs with the
> amount you can do generically (e.g. we have code
> that first loops over
> counters, then over distributions, then over gauges,
> and I don't think
> we want continue that pattern all M places for all N
> types)).
>
> I would probably lean towards mostly doing runtime
> validation here.
> Specifically, one would have a data file that
> defines, for each known
> URN, its type along with the set of
> permitted/expected/required
> labels. On construction a MonitoringInfo data point,
> one would provide
> a URN + value + labelMap, and optionally a type. On
> construction, the
> constructor (method, factory, whatever) would look
> up the URN to
> determine the type (or throw an error if it's both
> not known and not
> provided), which could be then used to fetch an
> encoder of sorts (that
> can go from value <-> proto encoding, possibly with
> some validation).
> If labels and/or annotations are provided and the
> URN is known, we can
> validate these as well.
>
> As for proto/enums vs. yaml, the former is nice
> because code
> generation comes for free, but has turned out to be
> much more verbose
> (both the definition and the use) and I'm still on
> the fence whether
> it's a net win.
>
> (Unfortunately AutoValue won't help much here, as
> the ultimate goal is
> to wrap a very nested proto OneOf, ideally with some
> validation.
> (Also, in Python, generally, having optional, named
> arguments makes
> this kind of builder pattern less needed.))
>
> On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles
> <kenn@apache.org <ma...@apache.org>> wrote:
> >
> > FWIW AutoValue will build most of that class for
> you, if it is as you say.
> >
> > Kenn
> >
> > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato
> <ajamato@google.com <ma...@google.com>> wrote:
> >>
> >> Hi Robert + beam dev list,
> >>
> >> I was thinking about your feedback in PR#6205,
> and agree that this monitoring_infos.py became a bit
> big.
> >>
> >> I'm working on the Java Implementation of this
> now, and want to incorporate some of these ideas and
> improve on this.
> >>
> >> I that that I should make something like a
> MonitoringInfoBuilder class. With a few methods
> >>
> >> setUrn
> >> setTimestamp
> >> setting the value (One method for each Type we
> support. Setting this will also set the type string)
> >>
> >> setInt64CounterValue
> >> setDoubleCounterValue
> >> setLatestInt64
> >> setTopNInt64
> >> setMonitoringDataTable
> >> setDistributionInt64
> >> ...
> >>
> >> setting labels (will set the key and value
> properly for the label)
> >>
> >> setPTransform(value)
> >> setPcollection(value)
> >> ...
> >>
> >>
> >> I think this will make building a metric much
> easier, you would just call the 4 methods and the
> .build(). These builders are common in Java. (I
> guess there is a similar thing way we could do in
> python? I'd like to go back and refactor that as
> well when I am done)
> >>
> >> -------------
> >>
> >> As for your other suggestion to define metrics
> in the proto/enum file instead of the yaml file. I
> am not too sure about the best strategy for this. My
> initial thoughts are:
> >>
> >> Make a proto extension allowing you to
> describe/define a MonitoringInfo's (the same info as
> the metric_definitions.yaml file):
> >>
> >> URN
> >> Type
> >> Labels required
> >> Annotations: Description, Units, etc.
> >>
> >> Make the builder read in that MonitoringInfo
> definision/description assert everything is set
> properly? I think this would be a decent data driven
> approach.
> >>
> >> I was wondering if you had something else in mind?
> >>
> >> Thanks
> >> Alex
> >>
> >>
>