You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2015/04/18 06:59:00 UTC

[jira] [Comment Edited] (MESOS-2629) Update style guide to disallow capture by reference of temporaries

    [ https://issues.apache.org/jira/browse/MESOS-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501097#comment-14501097 ] 

Michael Park edited comment on MESOS-2629 at 4/18/15 4:58 AM:
--------------------------------------------------------------

As suggested in [r33271|https://reviews.apache.org/r/33271], I'll try capture the full context and explanation regarding this topic.

h3. Brief History

Function parameters and local variables are similar in many ways e.g. both allocated on the stack, have the scope, a function argument is analogous to a variable initializer. The old K&R C syntax screams at the similarities as well:

{code}
void F(x) int x; { int y; }
{code}

Binding a temporary object to a const-ref *function parameter* makes sense since the temporary is guaranteed to stay alive until the function is fully evaluated. e.g. In the expression: {{F(create_temporary());}}, the result of {{create_temporary()}} is guaranteed to stay alive until the {{F}} is fully evaluated.

The feature to allow local variables to bind to a temporary was originally introduced to keep consistent with the above behavior since local variables are very similar to function parameters.

h3. Basic Rule for Lifetime of a Temporary Object

The basic rule for the lifetime of a temporary object is that it is destroyed at the end of the full expression involving the temporary object. I'll go into more depth in this post, but no one should have to remember further than this basic rule.

h3. Standardese

Let's get to the nitty gritty. The following is quoted from [N4296|http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf].

{quote}
4 There are two contexts in which temporaries are destroyed at a different point than the end of the full-expression.
The first context is when a default constructor is called to initialize an element of an array. If the constructor has one or more default arguments, the destruction of every temporary created in a default argument is sequenced before the construction of the next array element, if any.
{quote}

This one's not all that interesting, let's skip it.

{quote}
5 The second context is when a reference is bound to a temporary.116 The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except:
    (5.1) — A temporary object bound to a reference parameter in a function call (5.2.2) persists until the completion of the full-expression containing the call.
    (5.2) — The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.
    (5.3) — A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the full-expression containing the new-initializer. [Example:
{code}
        struct S { int mi; const std::pair<int,int>& mp; };
        S a { 1, {2,3} };
        S* p = new S{ 1, {2,3} }; // Creates dangling reference
{code}
    — end example ] [ Note: This may introduce a dangling reference, and implementations are encouraged to issue a warning in such a case. — end note ]
{quote}

So (5) tells us that when a reference is bound to a temporary, in general, the temporary's lifetime is extended to the reference's lifetime except the 3 special cases listed as (5.1), (5.2) and (5.3). (5.1) allows temporaries to stick around for the full expression if it's bound to a reference parameter in a function call, which simply brings us back to the *Basic Rule* above. (5.2) says you can't return a temporary and bind it to a reference return type. i.e. we can't write code like the following and expect the temporary's lifetime to be extended.
{code} const Obj& createObj() { return Obj(); } {code}
(5.3): Well, there's an example for it. Basically, we can't extend the lifetime of a temporary bound to a reference in the heap.

h3. Pitfall

The major issue is that after looking at the code that (5) enables us to write:

{code}
const std::string& name = std::string("mpark");
{code}

*It misleads us to think that we can bind temporaries to a const-ref variable, then proceed to pass the const-ref variable around and it will keep the temporary alive*. This is the source of all the crazy. Here's probably the simplest example that breaks the above valid code.

{code}
const std::string& identity(const std::string& s) { return s; }
const std::string& name = identity(std::string("mpark"));
{code}

{{identity}} is a function that simply takes a string and returns it. Logically the code is no different, but {{name}} is now invalid. Why? Remember (5.2) which brings us back to the *Basic Rule*? {{std::string("mpark")}} is bound to a reference parameter of {{identity}}!

Recall that a member function is no different than a regular function except that it has an implicit object parameter as its first parameter. Therefore, (5.2) applies again when we call a member function on a temporary object which brings us back to the *Basic Rule* again. This is what we get hit by most often as illustrated by the *Mesos Case* section in the *Description*.

h3. Why Disallow the Simple Case?

There are a couple of reasons to disallow even the simple case. The main one is what is described in the *Reasoning* section in the *Description*, which is that it's very easy to turn the simple case into the complex case which causes hard-to-find bugs. The other is that it buys us absolutely nothing. Performance-wise, it's no better than what RVO and NRVO can do. In terms of "what people are used to", capturing temporaries by const-ref is not and has never been standard practice in the C++ community. There's no worries in regards to surprising new-comers.

h3. Bonus I: Reasoning Through the Rules

Reasoning about what the compiler has to do to implement this feature helps to clarify some of the rules and behaviors that may seem crazy. When the compiler sees a local const-ref variable directly bound to a temporary object, it creates a hidden variable in the same scope as the local variable and binds the temporary object to it. Essentially turning the following code

{code}
const std::string& name = std::string("mpark");
{code}

into

{code}
std::string hidden = std::string("mpark");
const std::string& name = hidden;
{code}

* This is how temporary object's lifetime is "extended" to the lifetime of the const-ref variable.
* This is why const-ref is no better in performance than RVO/NRVO, because it needs to essentially perform RVO/NRVO anyway.
* This is why (5.3) exists. Since we can't perform RVO/NRVO into heap memory, we can't extend the lifetime of temporaries bound to the reference members on the heap.
* In Herb's [GotW|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/] article, the Guru question is

{quote}When the reference goes out of scope, which destructor gets called?{quote}

The answer is:
{quote}
You can take a const Base& to a Derived temporary and it will be destroyed without virtual dispatch on the destructor call.

This is nifty. Consider the following code:

{code}
// Example 3

Derived factory(); // construct a Derived object

void g() {
  const Base& b = factory(); // calls Derived::Derived here
  // … use b …
} // calls Derived::~Derived directly here — not Base::~Base + virtual dispatch!
{code}
{quote}

The compiler essentially turns the above code into:

{code}
void g() {
  Derived hidden = factory();
  const Base& b = hidden;
  // … use b …
}
{code}

So yes, it makes sense that {{Derived::~Derived}} would be invoked directly.

h3. Bonus II: What's "the most important *const*" referred to in the [GotW|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/]?

{quote}
I once heard Andrei Alexandrescu give a talk on ScopeGuard (invented by Petru Marginean) where he used this C++ feature and called it "the most important *const* I ever wrote."
{quote}

{{ScopeGuard<Fn>}} (available in [Facebook Folly|https://github.com/facebook/folly/blob/master/folly/ScopeGuard.h]) is templated on the function object it holds. {{makeGuard(fn)}} is a factory function that returns an instance of {{ScopeGuard<Fn>}}. Since {{ScopeGuard}} was introduced before C\+\+11, they didn't have {{auto}} available to simply do {{auto guard = makeGuard(fn);}}. This lead to the following technique: (1) Make {{ScopeGuard<Fn>}} inherit from {{ScopeGuardBase}}, (2) Capture the temporary of type {{ScopeGuard<Fn>}} created by {{makeGuard(fn);}} by {{const ScopeGuardBase &}}. i.e. {{const ScopeGuardBase& guard = makeGuard(fn);}}. But with C++11's introduction of {{auto}}, this technique is now obsolete.


was (Author: mcypark):
As suggested in [r33271|https://reviews.apache.org/r/33271], I'll try capture the full context and explanation regarding this topic.

h3. Brief History

Function parameters and local variables are similar in many ways e.g. both allocated on the stack, have the scope, a function argument is analogous to a variable initializer. The old K&R C syntax screams at the similarities as well:

{code}
void F(x) int x; { int y; }
{code}

Binding a temporary object to a const-ref *function parameter* makes sense since the temporary is guaranteed to stay alive until the function is fully evaluated. e.g. In the expression: {{F(create_temporary());}}, the result of {{create_temporary()}} is guaranteed to stay alive until the {{F}} is fully evaluated.

The feature to allow local variables to bind to a temporary was originally introduced to keep consistent with the above behavior since local variables are very similar to function parameters.

h3. Basic Rule for Lifetime of a Temporary Object

The basic rule for the lifetime of a temporary object is that it is destroyed at the end of the full expression involving the temporary object. I'll go into more depth in this post, but no one should have to remember further than this basic rule.

h3. Standardese

Let's get to the nitty gritty. The following is quoted from [N4296|http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf].

{quote}
4 There are two contexts in which temporaries are destroyed at a different point than the end of the full-expression.
The first context is when a default constructor is called to initialize an element of an array. If the constructor has one or more default arguments, the destruction of every temporary created in a default argument is sequenced before the construction of the next array element, if any.
{quote}

This one's not all that interesting, let's skip it.

{quote}
5 The second context is when a reference is bound to a temporary.116 The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except:
    (5.1) — A temporary object bound to a reference parameter in a function call (5.2.2) persists until the completion of the full-expression containing the call.
    (5.2) — The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.
    (5.3) — A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the full-expression containing the new-initializer. [Example:
{code}
        struct S { int mi; const std::pair<int,int>& mp; };
        S a { 1, {2,3} };
        S* p = new S{ 1, {2,3} }; // Creates dangling reference
{code}
    — end example ] [ Note: This may introduce a dangling reference, and implementations are encouraged to issue a warning in such a case. — end note ]
{quote}

So (5) tells us that when a reference is bound to a temporary, in general, the temporary's lifetime is extended to the reference's lifetime except the 3 special cases listed as (5.1), (5.2) and (5.3). (5.1) allows temporaries to stick around for the full expression if it's bound to a reference parameter in a function call, which simply brings us back to the *Basic Rule* above. (5.2) says you can't return a temporary and bind it to a reference return type. i.e. we can't write code like the following and expect the temporary's lifetime to be extended.
{code} const Obj& createObj() { return Obj(); } {code}.
(5.3): Well, there's an example for it. Basically, we can't extend the lifetime of a temporary bound to a reference in the heap.

h3. Pitfall

The major issue is that after looking at the code that (5) enables us to write:

{code}
const std::string& name = std::string("mpark");
{code}

*It misleads us to think that we can bind temporaries to a const-ref variable, then proceed to pass the const-ref variable around and it will keep the temporary alive*. This is the source of all the crazy. Here's probably the simplest example that breaks the above valid code.

{code}
const std::string& identity(const std::string& s) { return s; }
const std::string& name = identity(std::string("mpark"));
{code}

{{identity}} is a function that simply takes a string and returns it. Logically the code is no different, but {{name}} is now invalid. Why? Remember (5.2) which brings us back to the *Basic Rule*? {{std::string("mpark")}} is bound to a reference parameter of {{identity}}!

Recall that a member function is no different than a regular function except that it has an implicit object parameter as its first parameter. Therefore, (5.2) applies again when we call a member function on a temporary object which brings us back to the *Basic Rule* again. This is what we get hit by most often as illustrated by the *Mesos Case* section in the *Description*.

h3. Why Disallow the Simple Case?

There are a couple of reasons to disallow even the simple case. The main one is what is described in the *Reasoning* section in the *Description*, which is that it's very easy to turn the simple case into the complex case which causes hard-to-find bugs. The other is that it buys us absolutely nothing. Performance-wise, it's no better than what RVO and NRVO can do. In terms of "what people are used to", capturing temporaries by const-ref is not and has never been standard practice in the C++ community. There's no worries in regards to surprising new-comers.

h3. Bonus I: Reasoning Through the Rules

Reasoning about what the compiler has to do to implement this feature helps to clarify some of the rules and behaviors that may seem crazy. When the compiler sees a local const-ref variable directly bound to a temporary object, it creates a hidden variable in the same scope as the local variable and binds the temporary object to it. Essentially turning the following code

{code}
const std::string& name = std::string("mpark");
{code}

into

{code}
std::string hidden = std::string("mpark");
const std::string& name = hidden;
{code}

* This is how temporary object's lifetime is "extended" to the lifetime of the const-ref variable.
* This is why const-ref is no better in performance than RVO/NRVO, because it needs to essentially perform RVO/NRVO anyway.
* This is why (5.3) exists. Since we can't perform RVO/NRVO into heap memory, we can't extend the lifetime of temporaries bound to the reference members on the heap.
* In Herb's [GotW|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/] article, the Guru question is

{quote}When the reference goes out of scope, which destructor gets called?{quote}

The answer is:
{quote}
You can take a const Base& to a Derived temporary and it will be destroyed without virtual dispatch on the destructor call.

This is nifty. Consider the following code:

{code}
// Example 3

Derived factory(); // construct a Derived object

void g() {
  const Base& b = factory(); // calls Derived::Derived here
  // … use b …
} // calls Derived::~Derived directly here — not Base::~Base + virtual dispatch!
{code}
{quote}

The compiler essentially turns the above code into:

{code}
void g() {
  Derived hidden = factory();
  const Base& b = hidden;
  // … use b …
}
{code}

So yes, it makes sense that {{Derived::~Derived}} would be invoked directly.

h3. Bonus II: What's "the most important *const*" referred to in the [GotW|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/]?

{quote}
I once heard Andrei Alexandrescu give a talk on ScopeGuard (invented by Petru Marginean) where he used this C++ feature and called it "the most important *const* I ever wrote."
{quote}

{{ScopeGuard<Fn>}} (available in [Facebook Folly|https://github.com/facebook/folly/blob/master/folly/ScopeGuard.h]) is templated on the function object it holds. {{makeGuard(fn)}} is a factory function that returns an instance of {{ScopeGuard<Fn>}}. Since {{ScopeGuard}} was introduced before C\+\+11, they didn't have {{auto}} available to simply do {{auto guard = makeGuard(fn);}}. This lead to the following technique: (1) Make {{ScopeGuard<Fn>}} inherit from {{ScopeGuardBase}}, (2) Capture the temporary of type {{ScopeGuard<Fn>}} created by {{makeGuard(fn);}} by {{const ScopeGuardBase &}}. i.e. {{const ScopeGuardBase& guard = makeGuard(fn);}}. But with C++11's introduction of {{auto}}, this technique is now obsolete.

> Update style guide to disallow capture by reference of temporaries
> ------------------------------------------------------------------
>
>                 Key: MESOS-2629
>                 URL: https://issues.apache.org/jira/browse/MESOS-2629
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Joris Van Remoortere
>            Assignee: Joris Van Remoortere
>
> We modify the style guide to disallow constant references to temporaries as a whole. This means disallowing both (1) and (2) below.
> h3. Background
> 1. Constant references to simple expression temporaries do extend the lifetime of the temporary till end of function scope:
> * Temporary returned by function:
>   {code}
>   // See full example below.
>   T f(const char* s) { return T(s); }
>   {
>     const T& good = f("Ok");
>     // use of good is ok.
>   }
>   {code}
> * Temporary constructed as simple expression:
>   {code}
>   // See full example below.
>   {
>     const T& good = T("Ok");
>     // use of good is ok.
>   }
>   {code}
> 2. Constant references to expressions that result in a reference to a temporary do not extend the lifetime of the temporary:
>   * Temporary returned by function:
>   {code}
>   // See full example below.
>   T f(const char* s) { return T(s); }
>   {
>     const T& bad = f("Bad!").Member();
>     // use of bad is invalid.
>   }
>   {code}
>   * Temporary constructed as simple expression:
>   {code}
>   // See full example below.
>   {
>     const T& bad = T("Bad!").Member();
>     // use of bad is invalid.
>   }
>   {code}
> h3. Mesos Case
>   - In Mesos we use Future<T> a lot. Many of our functions return Futures by value:
>   {code}
>   class Socket {
>     Future<Socket> accept();
>     Future<size_t> recv(char* data, size_t size);
>     ...
>   }
>   {code}
>   - Sometimes we capture these Futures:
>   {code}
>   {
>     const Future<Socket>& accepted = socket.accept(); // Valid c++, propose we disallow.
>   }
>   {code}
>   - Sometimes we chain these Futures:
>   {code}
>   {
>     socket.accept().then(lambda::bind(_accepted)); // Temporary will be valid during 'then' expression evaluation.
>   }
>   {code}
>   - Sometimes we do both:
>   {code}
>   {
>     const Future<Socket>& accepted = socket.accept().then(lambda::bind(_accepted)); // Dangerous! 'accepted' lifetime will not be valid till end of scope. Disallow!
>   }
>   {code}
> h3. Reasoning
> - Although (1) is ok, and considered a [feature|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/], (2) is extremely dangerous and leads to hard to track bugs.
> - If we explicitly allow (1), but disallow (2), then my worry is that someone coming along to maintain the code later on may accidentally turn (1) into (2), without recognizing the severity of this mistake. For example:
> {code}
> // Original code:
> const T& val = T();
> std::cout << val << std::endl;
> // New code:
> const T& val = T().removeWhiteSpace();
> std::cout << val << std::endl; // val could be corrupted since the destructor has been invoked and T's memory freed.
> {code}
> - If we disallow both cases: it will be easier to catch these mistakes early on in code reviews (and avoid these painful bugs), at the same cost of introducing a new style guide rule.
> h3. Performance Implications
> - BenH suggests c++ developers are commonly taught to capture by constant reference to hint to the compiler that the copy can be elided.
> - Modern compilers use a Data Flow Graph to make optimizations such as
>   - *In-place-construction*: leveraged by RVO and NRVO to construct the object in place on the stack. Similar to "*Placement new*": http://en.wikipedia.org/wiki/Placement_syntax
>   - *RVO* (Return Value Optimization): http://en.wikipedia.org/wiki/Return_value_optimization
>   - *NRVO* (Named Return Value Optimization): https://msdn.microsoft.com/en-us/library/ms364057%28v=vs.80%29.aspx
> - Since modern compilers perform these optimizations, we no longer need to 'hint' to the compiler that the copies can be elided.
> h3. Example program
> {code}
> #include <stdio.h>
> class T {
> public:
>   T(const char* str) : Str(str) {
>     printf("+ T(%s)\n", Str);
>   }
>   ~T() {
>     printf("- T(%s)\n", Str);
>   }
>   const T& Member() const
>   {
>     return *this;
>   }
> private:
>   const char* Str;
> };
> T f(const char* s) { return T(s); }
> int main() {
>   const T& good = T("Ok");
>   const T& good_f = f("Ok function");
>   const T& bad = T("Bad!").Member();
>   const T& bad_f = T("Bad function!").Member();
>   printf("End of function scope...\n");
> }
> {code}
> Output:
> {code}
> + T(Ok)
> + T(Ok function)
> + T(Bad!)
> - T(Bad!)
> + T(Bad function!)
> - T(Bad function!)
> End of function scope...
> - T(Ok function)
> - T(Ok)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)