You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2016/07/15 22:03:08 UTC
Helper Functions and Code Readability

I was reviewing code and noticed the following helper in the allocator
benchmarks [1]:

    // This returns a `Labels` that has 12 key-value pairs, which should
    // be more than we expect most frameworks to use in practice. We
    // ensure that the first 11 key-value pairs are equal, which results
    // in pessimal performance for the equality operator between
    // Labels. Finally, we add `labelId` to allow the caller to ensure
    // that all labels in the cluster are distinct, which can trigger
    // allocator performance bottlenecks.
    static Labels makeLabels(bool first, size_t labelId);

This helper leads to code that is difficult to read and understand,
consider this call-site:

    Labels labels1 = makeLabels(true, i);
    Labels labels2 = makeLabels(false, i);

It is impossible to correctly intuit what this does (what do the labels
look like? what is the boolean used for? how many labels does it create?).
The one thing I would intuit from this signature is actually incorrect: I'd
intuit that the second parameter is the number of labels to create (how
else would it know how many to create?).

If I look at the signature to see the argument names ('first' and
'labelId') I still can't intuit what this does (the 'labelId' must be
inside each label..? It's not, only in the last one). Only after I read the
lengthy documentation do I see the details. The boolean is still a mystery
though (not documented)..

Here's a suggestion to increase readability:

    // Returns the requested number of labels:
    //   [{"<key>_1": "<value>_1"}, ..., {"<key>_<count>":
"<value>_<count>"}]
    static Labels makeLabels(const string& key, const string& value, size_t
count);

Then, the highly specific details of using 12 keys (with the last one being
different) can be placed local to the call-site that needs those details:

    // We create reservations with 12 labels as we expect this is
    // more than most frameworks use. Note that only the 12th
    // label differs between the two sets of labels as this triggers
    // the pathological performance path in the Labels equality
    // operator.
    //
    // We add a unique id to each agent's reservation labels to
    // ensure that any aggregation across agents leads to
    // pathological performance (reservations with distinct labels
    // cannot be merged).
    //
    // TODO(neilc): Test with longer key / value lengths.
    Labels labels1 = createLabels("key", "value", 11);
    labels1.add_label()->CopyFrom(createLabel("unique_key_1", "value_" +
id));

    Labels labels2 = createLabels("key", "value", 11);
    labels1.add_label()->CopyFrom(createLabel("unique_key_2", "value_" +
id));

Note that I used "key" and "value" rather than "foo", "bar", "baz", "bar1",
"baz1" as used in the existing code as this also aids readability.

This call-site is now "readable" in that I do not need to consult the
documentation for createLabel and createLabels to infer the behavior (one
should be able to intuit the meaning of 11: createLabels needs to know how
many to create -> that must be passed as an argument). Since this isn't a
lot of code (2 more lines) and it's readable, I'd opt to have callers
compose the generic helpers instead of trying to cut out this small amount
of duplication.

The way we write our helpers can have a huge impact (for better or worse)
on readability, something to consider when adding new ones.

Ben

[1]
https://github.com/apache/mesos/blob/165a6c79450ed78dce2adc944e6741c3aab66c8f/src/tests/hierarchical_allocator_tests.cpp#L3491-L3516