You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jeff Coffler <Je...@microsoft.com.INVALID> on 2017/02/14 19:28:50 UTC

Proposal for Mesos Build Improvements

Proposal For Build Improvements

The Mesos build process is in dire need of some build infrastructure improvements. These improvements will improve speed and ease of work in particular components, and dramatically improve overall build time, especially in the Windows environment, but likely in the Linux environment as well.


Background:

It is currently recommended to use the ccache project with the Mesos build process. This makes the Linux build process more tolerable in terms of speed, but unfortunately such software is not available on Windows. Ultimately, though, the caching software is covering up two fundamental flaws in the overall build process:

1. Lack of use of libraries
2. Lack of precompiled headers

By not allowing use of libraries, the overall build process is often much longer, particularly when a lot of work is being done in a particular component. If work is being done in a particular component, only that library need be rebuilt (and then the overall image relinked). Currently, since there is no such modularization, all source files must be considered at build time. Interestingly enough, there is such modularization in the source code layout; that modularization just isn't utilized at the compiler level.

Precompiled headers exist on both Windows and Linux. For Linux, you can refer to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from the GNU CC documentation: "The time the compiler takes to process these header files over and over again can account for nearly all of the time required to build the project."

In my prior use of precompiled headers, each C or C++ file generally took about 4 seconds to compile. After switching to precompiled headers, the precompiled header creation took about 4 seconds, but each C/C++ file now took about 200 milliseconds to compile. The overall build speed was thus dramatically reduced.


Scope of Changes:

These changes are only being proposed for the CMake system. Going forward, the CMake system is the easiest way to maintain some level of portability between the Linux and Windows platforms.


Details for Modularization:

For the modularization, the intent is to simply make each source directory of files, if functionally separate, to be compiled into an archive (.a) file. These archive files will then be linked together to form the actual executables. These changes will primarily be in the CMake system, and should have limited effect on any actual source code.

At a later date, if it makes sense, we can look at building shared library (.so) files. However, this only makes the most sense if the code is truly shared between different executable files. If that's not the case, then it likely makes sense just to stick with .a files. Regardless, generation of .so files is out of scope for this change.


Details for Precompiled Header Changes:

Precompiled headers will make use of stout (a very large header-only library) essentially "free" from a compile-time overhead point of view. Basically, precompiled headers will take a list of header files (including very long header files, like "windows.h"), and generate the compiler memory structures for their representation.

During precompiled header generation, these memory structures are flushed to disk. Then, when components are built, the memory structures are reloaded from disk, which is dramatically faster than actually parsing the tens of thousands of lines of header files and building the memory structures.

For precompiled headers to be useful, a relatively "consistent" set of headers must be included by all of the C/C++ files. So, for example, consider the following C file:

#if defined(windows)
#include <windows.h>
#endif

#include <header-a>
#include <header-b>
#include <header-c>

< - Remainder of module - >

To make a precompiled header for this module, all of the #include files would be included in a new file, mesos_common.h. The C file would then be changed as follows:

#include "mesos_common.h"

< - Remainder of module - >

Structurally, the code is identical, and need not be built with precompiled headers. However, use of precompiled headers will make file compilation dramatically faster.

Note that other include files can be included after the precompiled header if appropriate. For example, the following is valid:

#include "mesos_common.h"
#inclue <header-d>

< - Remainder of module - >

For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.

Note that, for use of precompiled headers, there will be a great deal of code churn (almost exclusively in the #include list of source files). This will mean that there will be a lot of code merges, but ultimately no "code logic" changes. If merges are not done in a timely fashion, this can easily result in needless hand merging of changes. Due to these issues, we will need a dedicated sheppard that will integrate the patches quickly. This kind of work is easily invalidated when the include list is changed by another developer, necessitating us to redo the patch. [Note that Joseph has stepped up to the plate for this, thanks Joseph!]


This is the end of my proposal, feedback would be appreciated.

Re: Proposal for Mesos Build Improvements

Posted by Alex Clemmer <cl...@gmail.com>.
Just to add a bit of context, the history of the issue of build time is
tracked in MESOS-1582[1], and most recently[2].

Speaking personally, I'm excited about _any_ progress in this area,
because (1) the Windows build times are completely unbearable, and (2)
because getting the build times down benefits the whole community.

When it was basically just me working on the Windows code paths, this
issue was tolerable, but now that we have multiple people working
full-time, it is really important to start fixing the issue.

[1] https://issues.apache.org/jira/browse/MESOS-1582
[2]
https://issues.apache.org/jira/browse/MESOS-1582?focusedCommentId=15828645&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15828645


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Tue, 14 Feb 2017, Jeff Coffler wrote:

> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure improvements. These improvements will improve speed and ease of work in particular components, and dramatically improve overall build time, especially in the Windows environment, but likely in the Linux environment as well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build process. This makes the Linux build process more tolerable in terms of speed, but unfortunately such software is not available on Windows. Ultimately, though, the caching software is covering up two fundamental flaws in the overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much longer, particularly when a lot of work is being done in a particular component. If work is being done in a particular component, only that library need be rebuilt (and then the overall image relinked). Currently, since there is no such modularization, all source files must be considered at build time. Interestingly enough, there is such modularization in the source code layout; that modularization just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from the GNU CC documentation: "The time the compiler takes to process these header files over and over again can account for nearly all of the time required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took about 4 seconds to compile. After switching to precompiled headers, the precompiled header creation took about 4 seconds, but each C/C++ file now took about 200 milliseconds to compile. The overall build speed was thus dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, the CMake system is the easiest way to maintain some level of portability between the Linux and Windows platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of files, if functionally separate, to be compiled into an archive (.a) file. These archive files will then be linked together to form the actual executables. These changes will primarily be in the CMake system, and should have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library (.so) files. However, this only makes the most sense if the code is truly shared between different executable files. If that's not the case, then it likely makes sense just to stick with .a files. Regardless, generation of .so files is out of scope for this change.
>
>
> Details for Precompiled Header Changes:
>
> Precompiled headers will make use of stout (a very large header-only library) essentially "free" from a compile-time overhead point of view. Basically, precompiled headers will take a list of header files (including very long header files, like "windows.h"), and generate the compiler memory structures for their representation.
>
> During precompiled header generation, these memory structures are flushed to disk. Then, when components are built, the memory structures are reloaded from disk, which is dramatically faster than actually parsing the tens of thousands of lines of header files and building the memory structures.
>
> For precompiled headers to be useful, a relatively "consistent" set of headers must be included by all of the C/C++ files. So, for example, consider the following C file:
>
> #if defined(windows)
> #include <windows.h>
> #endif
>
> #include <header-a>
> #include <header-b>
> #include <header-c>
>
> < - Remainder of module - >
>
> To make a precompiled header for this module, all of the #include files would be included in a new file, mesos_common.h. The C file would then be changed as follows:
>
> #include "mesos_common.h"
>
> < - Remainder of module - >
>
> Structurally, the code is identical, and need not be built with precompiled headers. However, use of precompiled headers will make file compilation dramatically faster.
>
> Note that other include files can be included after the precompiled header if appropriate. For example, the following is valid:
>
> #include "mesos_common.h"
> #inclue <header-d>
>
> < - Remainder of module - >
>
> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> Note that, for use of precompiled headers, there will be a great deal of code churn (almost exclusively in the #include list of source files). This will mean that there will be a lot of code merges, but ultimately no "code logic" changes. If merges are not done in a timely fashion, this can easily result in needless hand merging of changes. Due to these issues, we will need a dedicated sheppard that will integrate the patches quickly. This kind of work is easily invalidated when the include list is changed by another developer, necessitating us to redo the patch. [Note that Joseph has stepped up to the plate for this, thanks Joseph!]
>
>
> This is the end of my proposal, feedback would be appreciated.
>

RE: Proposal for Mesos Build Improvements

Posted by Jeff Coffler <Je...@microsoft.com.INVALID>.
I agree with the first part (PCHs will introduce some additional dependencies for some object files).

I do not agree with the second point (PCHs become bloated over time, thus reducing time savings).

The key point with PCH is that mesos_common.h is only read ONCE. Even if large and bloated, since it's only read once, the time savings is not diminished at all.

/Jeff

-----Original Message-----
From: Alex Clemmer [mailto:clemmer.alexander@gmail.com] 
Sent: Wednesday, February 15, 2017 11:24 AM
To: dev <de...@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

Yes, that is right, PCHs would probably introduce some additional dependencies for some object files, and if those PCHs become bloated over time, then you can expect this to be expressed as diminishing time savings.

This does imply that maintaining PCHs will require at least some work.


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Wed, 15 Feb 2017, Neil Conway wrote:

> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler 
> <Je...@microsoft.com.invalid> wrote:
>> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> It seems like this would have the effect of creating many false
> dependencies: if file X doesn't currently include header Y but Y is 
> included in the precompiled header, the symbols in Y will now be 
> visible when X is compiled. It would also mean that X would need to be 
> recompiled when Y changes.
>
> Related: the current policy is that headers and implementation files 
> should try to include all of their dependencies, without relying on 
> transitive includes. For example, if foo.cpp includes bar.hpp, which 
> includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and 
> bar.hpp should "#include <vector>". Adopting precompiled headers would 
> mean making an exception to this policy, right?
>
> I wonder if we should instead use headers like:
>
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
>
> <- xyz.cpp, which needs headers "b" and "d" -> #include 
> "mesos_common.h>
>
> #include <b>
> #include <d>
>
> That way, the fact that "xyz.cpp" logically depends on <b> (but not 
> <a> or <c>) is not obscured (in other words, Mesos should continue to 
> compile if 'mesos_common.h' is replaced with an empty file). Does 
> anyone know whether the header guard in <b> _should_ make the repeated 
> inclusion of <b> relatively cheap?
>
> Neil
>

Re: Proposal for Mesos Build Improvements

Posted by Alex Clemmer <cl...@gmail.com>.
Yes, that is right, PCHs would probably introduce some additional
dependencies for some object files, and if those PCHs become bloated
over time, then you can expect this to be expressed as diminishing time
savings.

This does imply that maintaining PCHs will require at least some work.


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Wed, 15 Feb 2017, Neil Conway wrote:

> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
> <Je...@microsoft.com.invalid> wrote:
>> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> It seems like this would have the effect of creating many false
> dependencies: if file X doesn't currently include header Y but Y is
> included in the precompiled header, the symbols in Y will now be
> visible when X is compiled. It would also mean that X would need to be
> recompiled when Y changes.
>
> Related: the current policy is that headers and implementation files
> should try to include all of their dependencies, without relying on
> transitive includes. For example, if foo.cpp includes bar.hpp, which
> includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and
> bar.hpp should "#include <vector>". Adopting precompiled headers would
> mean making an exception to this policy, right?
>
> I wonder if we should instead use headers like:
>
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
>
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
>
> #include <b>
> #include <d>
>
> That way, the fact that "xyz.cpp" logically depends on <b> (but not
> <a> or <c>) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file). Does
> anyone know whether the header guard in <b> _should_ make the repeated
> inclusion of <b> relatively cheap?
>
> Neil
>

Re: Proposal for Mesos Build Improvements

Posted by Neil Conway <ne...@gmail.com>.
On Wed, Feb 15, 2017 at 1:59 PM, Jeff Coffler
<Je...@microsoft.com.invalid> wrote:
> 3. Maintaining the correct includes is nice, but not at the cost of compiler speed.

Personally, I would invert these statements -- but until we know the
cost of the redundant includes, probably not worth debating further.

> 4. I totally disagree about auto-generating the PCH. We should go through the sources and pick what makes sense. Auto-generating implies that we auto-generate all the time (on every build), and I'd rather not scan the sources during a build (with an associated speed hit) just to try and speed up the build.

The problem is that "what makes sense" will change over time.
Auto-generating the PCH certainly doesn't mean it needs to be
generated as part of the build process: a script (or docker container)
to generate "mesos_common.hpp" on-demand would be fine with me, as
long as it is a mechanical process.

Neil

RE: Proposal for Mesos Build Improvements

Posted by Jeff Coffler <Je...@microsoft.com.INVALID>.
I'm planning on prototyping this just to generate numbers. I don't think I need permission to do that! But, of course, to incorporate any changes into the code base, we need consensus.

I agree that stout optimizations are outside of the scope of this discussion. Any stout optimizations are orthogonal to PCH, and thus they need not be linked together. Note that stout optimizations may be less "pressing" with PCH, but still it's separate. The fact that PCH may help stout just indicates that PCH is a good thing, particularly on platforms like Windows (where we get to include windows.h, a massive file).

Also, I wanted to clarify a message from Benjamin. I did NOT mean to imply that PCH takes 20 seconds to generate. I was simply saying that PCH reads the headers ONCE and generates the PCH. As such, I don't believe that "bloat" is an issue here. In actuality, generating the PCH is about as long as reading them. But you read it once and generate the PCH, you don't read it once for each source file. That's the speed-up for PCH; a ton of header processing is done once. When I used PCH in the past, it took about 4 seconds to read all my headers. That 4 seconds was then subtracted from all the source compilations. That is, 4 seconds to generate, then all the compiles were 4 seconds faster.

Regarding Andy's points:

1. I agree, we need a benchmarked prototype. Note that I will only benchmark a particular directory, I don't intend to benchmark EVERYTHING. One directory should give us enough of an idea to see how it works.

2. Maintaining ccache compatibility is a good thing. BUT I don't think it's a hard requirement. If PCH on Linux gives us reasonable performance without ccache, then I don't see a lot of value in maintaining ccache compatibility. Now, that said, I will try to do so (why not?). But I'm not sure if these workarounds for ccache will work on Windows; we'll see during the prototyping stage.

3. Maintaining the correct includes is nice, but not at the cost of compiler speed. I'm not sure if Windows has "multiple include optimizations". I will include this in my prototyping. If it does, then I agree it would be very nice to maintain this. BUT in practice, it will be hard over time. After all, if you include mesos_common.h (either literally or by build system), you may not realize that you're missing an include without that. And I don't think it's "worth it" to build twice to catch this, once with PCH and once without. That's ugly, in my honest opinion.

4. I totally disagree about auto-generating the PCH. We should go through the sources and pick what makes sense. Auto-generating implies that we auto-generate all the time (on every build), and I'd rather not scan the sources during a build (with an associated speed hit) just to try and speed up the build.

Let me get some hard numbers under my belt. From that, we can make intelligent decisions about where to go.

/Jeff


-----Original Message-----
From: Andy Schwartzmeyer [mailto:andschwa@microsoft.com.INVALID] 
Sent: Wednesday, February 15, 2017 1:31 PM
To: dev <de...@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

Hi,

I worked with Jeff on the initial proposal for pre-compiled headers and library refactor. I think this thread should focus on the former, potentially implementing pre-compiled headers, and have a separate conversation on Jeff's original second suggestion of using more libraries inside Mesos.

With that in mind, I think we have some requirements for the pre-compiled header implementation.

* First and foremost, we need a benchmarked prototype that proves pre-compiled headers provide a considerable speed-up. As the most complex headers are those of the header-only Stout library, we should also benchmark improvements from making Stout non-header-only, and then prioritize; but this will likely be a separate discussion.

* We must maintain ccache compatibility, as the majority of Mesos developers already use ccache. It appears the most straightforward way to do this is to _not_ `#include common.h`, but to `-include` it; this fits well with the next requirement.

* We must maintain correct includes; i.e. Mesos should be compilable without the pre-compiled header. Because of multiple-include optimization, this should not affect the gains from the use of pre-compiled headers. Again, this fits well with the next requirement.

* We should automatically generate the pre-compiled header, as this eliminates manual maintenance. Combined with the above two points, this approach should actually negate the original code-churn problem. By generating a common header to pre-compile, and using `-include`, we will not have to modify existing source files. This would both give us ccache compatibility and ensure that the correct includes would be maintained (and thus can be refactored independently of this work).

Did I miss any points, or can we move forward with prototyping this?

Thanks,

-- Andy

________________________________________
From: Benjamin Bannier <be...@mesosphere.io>
Sent: Wednesday, February 15, 2017 12:26 PM
To: dev
Subject: Re: Proposal for Mesos Build Improvements

Hi,

> I wonder if we should instead use headers like:
>
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
>
> <- xyz.cpp, which needs headers "b" and "d" -> #include 
> "mesos_common.h>
>
> #include <b>
> #include <d>
>
> That way, the fact that "xyz.cpp" logically depends on <b> (but not 
> <a> or <c>) is not obscured (in other words, Mesos should continue to 
> compile if 'mesos_common.h' is replaced with an empty file).

That's an interesting angle for a number of reasons. It would allow local reasoning about correct includes, and it also appears to help maintain support for ccache'd builds,

  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fccache.samba.org%2Fmanual.html%23_precompiled_headers&data=02%7C01%7Candschwa%40microsoft.com%7C03f9ebaea1e3491c81e908d455e0e8ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636227871844766180&sdata=iWmHFa2Zpg%2B9nP7p8rtuJ20dS7k7bVXommvbqfg%2FLuA%3D&reserved=0

For that one could include project headers such as `mesos_common.h` via a command line switch to the compiler invocation, without the need to make any changes to source files (possibly an interesting way to create some benchmarking POC of this proposal).

Not changing source files for this would be valuable as it would keep build setup idiosyncrasies out of the source. If we wouldn't change files we'd keep the possibility to make PCH use opt-in. Right now a ccache build of the Mesos source files and tests with warm ccache takes less than 50s on my 8 core machine (a substantial fraction of this time is spent in serializing (non-parallelizable) linking steps, and I'd bet there is also some ~10s overhead from Make stat'ing files and changing directories in there).

Generating precompiled headers would throw in additional serializing step, and even if it really only would take 20s to generate a header as guestimated by Jeff, we would already be approaching a point of diminishing returns on platforms with ccache, even if we compiled every source file in no time.

> Does anyone know whether the header guard in <b> _should_ make the 
> repeated inclusion of <b> relatively cheap?

Not sure how much information gcc or clang would need to serialize from the PCH, but there is of course some form of multi-include optimization in both gcc and clang, see e.g.,

  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fcppinternals%2FGuard-Macros.html&data=02%7C01%7Candschwa%40microsoft.com%7C03f9ebaea1e3491c81e908d455e0e8ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636227871844766180&sdata=6eD5zC%2F62TgfS9q9EdCVh%2BLkQ8FqBiLc4VNc%2BR1Zn4k%3D&reserved=0


Cheers,

Benjamin

Re: Proposal for Mesos Build Improvements

Posted by Andy Schwartzmeyer <an...@microsoft.com.INVALID>.
Hi,

I worked with Jeff on the initial proposal for pre-compiled headers and library refactor. I think this thread should focus on the former, potentially implementing pre-compiled headers, and have a separate conversation on Jeff's original second suggestion of using more libraries inside Mesos.

With that in mind, I think we have some requirements for the pre-compiled header implementation.

* First and foremost, we need a benchmarked prototype that proves pre-compiled headers provide a considerable speed-up. As the most complex headers are those of the header-only Stout library, we should also benchmark improvements from making Stout non-header-only, and then prioritize; but this will likely be a separate discussion.

* We must maintain ccache compatibility, as the majority of Mesos developers already use ccache. It appears the most straightforward way to do this is to _not_ `#include common.h`, but to `-include` it; this fits well with the next requirement.

* We must maintain correct includes; i.e. Mesos should be compilable without the pre-compiled header. Because of multiple-include optimization, this should not affect the gains from the use of pre-compiled headers. Again, this fits well with the next requirement.

* We should automatically generate the pre-compiled header, as this eliminates manual maintenance. Combined with the above two points, this approach should actually negate the original code-churn problem. By generating a common header to pre-compile, and using `-include`, we will not have to modify existing source files. This would both give us ccache compatibility and ensure that the correct includes would be maintained (and thus can be refactored independently of this work).

Did I miss any points, or can we move forward with prototyping this?

Thanks,

-- Andy

________________________________________
From: Benjamin Bannier <be...@mesosphere.io>
Sent: Wednesday, February 15, 2017 12:26 PM
To: dev
Subject: Re: Proposal for Mesos Build Improvements

Hi,

> I wonder if we should instead use headers like:
>
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
>
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
>
> #include <b>
> #include <d>
>
> That way, the fact that "xyz.cpp" logically depends on <b> (but not
> <a> or <c>) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file).

That’s an interesting angle for a number of reasons. It would allow local reasoning about correct includes, and it also appears to help maintain support for ccache’d builds,

  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fccache.samba.org%2Fmanual.html%23_precompiled_headers&data=02%7C01%7Candschwa%40microsoft.com%7C03f9ebaea1e3491c81e908d455e0e8ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636227871844766180&sdata=iWmHFa2Zpg%2B9nP7p8rtuJ20dS7k7bVXommvbqfg%2FLuA%3D&reserved=0

For that one could include project headers such as `mesos_common.h` via a command line switch to the compiler invocation, without the need to make any changes to source files (possibly an interesting way to create some benchmarking POC of this proposal).

Not changing source files for this would be valuable as it would keep build setup idiosyncrasies out of the source. If we wouldn’t change files we’d keep the possibility to make PCH use opt-in. Right now a ccache build of the Mesos source files and tests with warm ccache takes less than 50s on my 8 core machine (a substantial fraction of this time is spent in serializing (non-parallelizable) linking steps, and I’d bet there is also some ~10s overhead from Make stat’ing files and changing directories in there).

Generating precompiled headers would throw in additional serializing step, and even if it really only would take 20s to generate a header as guestimated by Jeff, we would already be approaching a point of diminishing returns on platforms with ccache, even if we compiled every source file in no time.

> Does anyone know whether the header guard in <b> _should_ make the repeated
> inclusion of <b> relatively cheap?

Not sure how much information gcc or clang would need to serialize from the PCH, but there is of course some form of multi-include optimization in both gcc and clang, see e.g.,

  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fcppinternals%2FGuard-Macros.html&data=02%7C01%7Candschwa%40microsoft.com%7C03f9ebaea1e3491c81e908d455e0e8ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636227871844766180&sdata=6eD5zC%2F62TgfS9q9EdCVh%2BLkQ8FqBiLc4VNc%2BR1Zn4k%3D&reserved=0


Cheers,

Benjamin

Re: Proposal for Mesos Build Improvements

Posted by Benjamin Bannier <be...@mesosphere.io>.
Hi,

> I wonder if we should instead use headers like:
> 
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
> 
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
> 
> #include <b>
> #include <d>
> 
> That way, the fact that "xyz.cpp" logically depends on <b> (but not
> <a> or <c>) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file).

That’s an interesting angle for a number of reasons. It would allow local reasoning about correct includes, and it also appears to help maintain support for ccache’d builds,

  https://ccache.samba.org/manual.html#_precompiled_headers

For that one could include project headers such as `mesos_common.h` via a command line switch to the compiler invocation, without the need to make any changes to source files (possibly an interesting way to create some benchmarking POC of this proposal).

Not changing source files for this would be valuable as it would keep build setup idiosyncrasies out of the source. If we wouldn’t change files we’d keep the possibility to make PCH use opt-in. Right now a ccache build of the Mesos source files and tests with warm ccache takes less than 50s on my 8 core machine (a substantial fraction of this time is spent in serializing (non-parallelizable) linking steps, and I’d bet there is also some ~10s overhead from Make stat’ing files and changing directories in there).

Generating precompiled headers would throw in additional serializing step, and even if it really only would take 20s to generate a header as guestimated by Jeff, we would already be approaching a point of diminishing returns on platforms with ccache, even if we compiled every source file in no time.

> Does anyone know whether the header guard in <b> _should_ make the repeated
> inclusion of <b> relatively cheap?

Not sure how much information gcc or clang would need to serialize from the PCH, but there is of course some form of multi-include optimization in both gcc and clang, see e.g.,

  https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html


Cheers,

Benjamin

Re: Proposal for Mesos Build Improvements

Posted by Vinod Kone <vi...@apache.org>.
Thanks Jeff for the proposal! Faster builds for Mesos have been a long
awaited feature, so great to see some real traction here.

Regarding benchmarks, would it be possible to have benchmarks (for clean
build and incremental build) with 1) PCH only change 2) stout
non-header-only and 3) 1+2 ?  Not sure how easy it is to prototype these to
get benchmarks, but I think having those numbers would clearly show us
which one which we should prioritize?

Thanks,


On Wed, Feb 15, 2017 at 11:34 AM, Neil Conway <ne...@gmail.com> wrote:

> Hi Jeff,
>
> Gotcha -- I just wanted to understand the tradeoffs here.
>
> I'd definitely prefer an approach in which we include "<xyz>" in both
> "mesos_common.hpp" and each individual file that logically depends on
> "<xyz>". This makes clear the dependencies between modules and also
> makes it easy to disable building with PCH (see also the
> recommendations in [1]). If the only reason avoid this is the cost of
> the repeated include, it would be important to see benchmarks that
> justify this.
>
> BTW, I think it's important that we script/automate this as far as
> possible, e.g., using a script to decide which headers are included
> often enough to justify being included in the PCH. This should avoid
> the PCH getting out of date, as well as innumerable arguments down the
> road about whether header X warrants being added to the PCH :)
>
> Overall, sounds cool to me! Faster builds would be fantastic.
>
> Neil
>
> [1] http://gamesfromwithin.com/the-care-and-feeding-of-pre-
> compiled-headers
>
> On Wed, Feb 15, 2017 at 11:26 AM, Jeff Coffler
> <Je...@microsoft.com.invalid> wrote:
> > Ni Neil,
> >
> > What you're saying is essentially correct. If mesos_common.h includes a
> bunch of, well, "common" stuff, and everybody includes mesos_common.h, then
> those files will, by definition, have a least some number of items that
> they didn't need.
> >
> > Since PCH works on both Windows and Linux, I don't think this is a "bad
> thing". It's a trade-off. Is a (what I believe to be) very significant
> speed-up in compile speed "worth it"? (Obviously, since I submitted the
> proposal, I think so. But this is a very valid point).
> >
> >  Yes, header guards will help, but header guards are not free. I would
> rather not include a really large set of headers (say, windows.h, or stout)
> multiple times, expecting header guards to make them fast. I'd rather just
> include them once, in mesos_common.h. And this would also yield the
> greatest performance enhancement as well.
> >
> > I'm working on getting some hard numbers for a subset of Mesos. Once we
> have some hard comparisons with compiler performance (with and without
> PCH), we can address this much more practically.
> >
> > /Jeff
> >
> >
> > -----Original Message-----
> > From: Neil Conway [mailto:neil.conway@gmail.com]
> > Sent: Wednesday, February 15, 2017 11:13 AM
> > To: dev <de...@mesos.apache.org>
> > Subject: Re: Proposal for Mesos Build Improvements
> >
> > On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <
> Jeff.Coffler@microsoft.com.invalid> wrote:
> >> For efficiency purposes, if a header file is included by 50% or more of
> the source files, it should be included in the precompiled header. If a
> header is included in fewer than 50% of the source files, then it can be
> separately included (and thus would not benefit from precompiled headers).
> Note that this is a guideline; even if a header is used by less than 50% of
> source files, if it's very large, we still may decide to throw it in the
> precompiled header.
> >
> > It seems like this would have the effect of creating many false
> > dependencies: if file X doesn't currently include header Y but Y is
> included in the precompiled header, the symbols in Y will now be visible
> when X is compiled. It would also mean that X would need to be recompiled
> when Y changes.
> >
> > Related: the current policy is that headers and implementation files
> should try to include all of their dependencies, without relying on
> transitive includes. For example, if foo.cpp includes bar.hpp, which
> includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and bar.hpp
> should "#include <vector>". Adopting precompiled headers would mean making
> an exception to this policy, right?
> >
> > I wonder if we should instead use headers like:
> >
> > <- mesos_common.h ->
> > #include <a>
> > #include <b>
> > #include <c>
> >
> > <- xyz.cpp, which needs headers "b" and "d" -> #include "mesos_common.h>
> >
> > #include <b>
> > #include <d>
> >
> > That way, the fact that "xyz.cpp" logically depends on <b> (but not <a>
> or <c>) is not obscured (in other words, Mesos should continue to compile
> if 'mesos_common.h' is replaced with an empty file). Does anyone know
> whether the header guard in <b> _should_ make the repeated inclusion of <b>
> relatively cheap?
> >
> > Neil
>

Re: Proposal for Mesos Build Improvements

Posted by Neil Conway <ne...@gmail.com>.
Hi Jeff,

Gotcha -- I just wanted to understand the tradeoffs here.

I'd definitely prefer an approach in which we include "<xyz>" in both
"mesos_common.hpp" and each individual file that logically depends on
"<xyz>". This makes clear the dependencies between modules and also
makes it easy to disable building with PCH (see also the
recommendations in [1]). If the only reason avoid this is the cost of
the repeated include, it would be important to see benchmarks that
justify this.

BTW, I think it's important that we script/automate this as far as
possible, e.g., using a script to decide which headers are included
often enough to justify being included in the PCH. This should avoid
the PCH getting out of date, as well as innumerable arguments down the
road about whether header X warrants being added to the PCH :)

Overall, sounds cool to me! Faster builds would be fantastic.

Neil

[1] http://gamesfromwithin.com/the-care-and-feeding-of-pre-compiled-headers

On Wed, Feb 15, 2017 at 11:26 AM, Jeff Coffler
<Je...@microsoft.com.invalid> wrote:
> Ni Neil,
>
> What you're saying is essentially correct. If mesos_common.h includes a bunch of, well, "common" stuff, and everybody includes mesos_common.h, then those files will, by definition, have a least some number of items that they didn't need.
>
> Since PCH works on both Windows and Linux, I don't think this is a "bad thing". It's a trade-off. Is a (what I believe to be) very significant speed-up in compile speed "worth it"? (Obviously, since I submitted the proposal, I think so. But this is a very valid point).
>
>  Yes, header guards will help, but header guards are not free. I would rather not include a really large set of headers (say, windows.h, or stout) multiple times, expecting header guards to make them fast. I'd rather just include them once, in mesos_common.h. And this would also yield the greatest performance enhancement as well.
>
> I'm working on getting some hard numbers for a subset of Mesos. Once we have some hard comparisons with compiler performance (with and without PCH), we can address this much more practically.
>
> /Jeff
>
>
> -----Original Message-----
> From: Neil Conway [mailto:neil.conway@gmail.com]
> Sent: Wednesday, February 15, 2017 11:13 AM
> To: dev <de...@mesos.apache.org>
> Subject: Re: Proposal for Mesos Build Improvements
>
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <Je...@microsoft.com.invalid> wrote:
>> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> It seems like this would have the effect of creating many false
> dependencies: if file X doesn't currently include header Y but Y is included in the precompiled header, the symbols in Y will now be visible when X is compiled. It would also mean that X would need to be recompiled when Y changes.
>
> Related: the current policy is that headers and implementation files should try to include all of their dependencies, without relying on transitive includes. For example, if foo.cpp includes bar.hpp, which includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and bar.hpp should "#include <vector>". Adopting precompiled headers would mean making an exception to this policy, right?
>
> I wonder if we should instead use headers like:
>
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
>
> <- xyz.cpp, which needs headers "b" and "d" -> #include "mesos_common.h>
>
> #include <b>
> #include <d>
>
> That way, the fact that "xyz.cpp" logically depends on <b> (but not <a> or <c>) is not obscured (in other words, Mesos should continue to compile if 'mesos_common.h' is replaced with an empty file). Does anyone know whether the header guard in <b> _should_ make the repeated inclusion of <b> relatively cheap?
>
> Neil

RE: Proposal for Mesos Build Improvements

Posted by Jeff Coffler <Je...@microsoft.com.INVALID>.
Ni Neil,

What you're saying is essentially correct. If mesos_common.h includes a bunch of, well, "common" stuff, and everybody includes mesos_common.h, then those files will, by definition, have a least some number of items that they didn't need.

Since PCH works on both Windows and Linux, I don't think this is a "bad thing". It's a trade-off. Is a (what I believe to be) very significant speed-up in compile speed "worth it"? (Obviously, since I submitted the proposal, I think so. But this is a very valid point).

 Yes, header guards will help, but header guards are not free. I would rather not include a really large set of headers (say, windows.h, or stout) multiple times, expecting header guards to make them fast. I'd rather just include them once, in mesos_common.h. And this would also yield the greatest performance enhancement as well.

I'm working on getting some hard numbers for a subset of Mesos. Once we have some hard comparisons with compiler performance (with and without PCH), we can address this much more practically.

/Jeff


-----Original Message-----
From: Neil Conway [mailto:neil.conway@gmail.com] 
Sent: Wednesday, February 15, 2017 11:13 AM
To: dev <de...@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <Je...@microsoft.com.invalid> wrote:
> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.

It seems like this would have the effect of creating many false
dependencies: if file X doesn't currently include header Y but Y is included in the precompiled header, the symbols in Y will now be visible when X is compiled. It would also mean that X would need to be recompiled when Y changes.

Related: the current policy is that headers and implementation files should try to include all of their dependencies, without relying on transitive includes. For example, if foo.cpp includes bar.hpp, which includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and bar.hpp should "#include <vector>". Adopting precompiled headers would mean making an exception to this policy, right?

I wonder if we should instead use headers like:

<- mesos_common.h ->
#include <a>
#include <b>
#include <c>

<- xyz.cpp, which needs headers "b" and "d" -> #include "mesos_common.h>

#include <b>
#include <d>

That way, the fact that "xyz.cpp" logically depends on <b> (but not <a> or <c>) is not obscured (in other words, Mesos should continue to compile if 'mesos_common.h' is replaced with an empty file). Does anyone know whether the header guard in <b> _should_ make the repeated inclusion of <b> relatively cheap?

Neil

Re: Proposal for Mesos Build Improvements

Posted by Alexander Rojas <al...@mesosphere.io>.
Actually, this is a policy I have never been a big fan of. In my experience just forward declaring as much as possible in the headers and only including in compilations units tend to have decent improvements in complication time, particularly files like `mesos.cpp` or `slave.cpp` which indirectly end up including almost every header in the project.

Alexander Rojas
alexander@mesosphere.io




> On 15 Feb 2017, at 20:12, Neil Conway <ne...@gmail.com> wrote:
> 
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
> <Je...@microsoft.com.invalid> wrote:
>> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
> 
> It seems like this would have the effect of creating many false
> dependencies: if file X doesn't currently include header Y but Y is
> included in the precompiled header, the symbols in Y will now be
> visible when X is compiled. It would also mean that X would need to be
> recompiled when Y changes.
> 
> Related: the current policy is that headers and implementation files
> should try to include all of their dependencies, without relying on
> transitive includes. For example, if foo.cpp includes bar.hpp, which
> includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and
> bar.hpp should "#include <vector>". Adopting precompiled headers would
> mean making an exception to this policy, right?
> 
> I wonder if we should instead use headers like:
> 
> <- mesos_common.h ->
> #include <a>
> #include <b>
> #include <c>
> 
> <- xyz.cpp, which needs headers "b" and "d" ->
> #include "mesos_common.h>
> 
> #include <b>
> #include <d>
> 
> That way, the fact that "xyz.cpp" logically depends on <b> (but not
> <a> or <c>) is not obscured (in other words, Mesos should continue to
> compile if 'mesos_common.h' is replaced with an empty file). Does
> anyone know whether the header guard in <b> _should_ make the repeated
> inclusion of <b> relatively cheap?
> 
> Neil


Re: Proposal for Mesos Build Improvements

Posted by Neil Conway <ne...@gmail.com>.
On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
<Je...@microsoft.com.invalid> wrote:
> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.

It seems like this would have the effect of creating many false
dependencies: if file X doesn't currently include header Y but Y is
included in the precompiled header, the symbols in Y will now be
visible when X is compiled. It would also mean that X would need to be
recompiled when Y changes.

Related: the current policy is that headers and implementation files
should try to include all of their dependencies, without relying on
transitive includes. For example, if foo.cpp includes bar.hpp, which
includes <vector>, but foo.cpp also uses <vector>, both foo.cpp and
bar.hpp should "#include <vector>". Adopting precompiled headers would
mean making an exception to this policy, right?

I wonder if we should instead use headers like:

<- mesos_common.h ->
#include <a>
#include <b>
#include <c>

<- xyz.cpp, which needs headers "b" and "d" ->
#include "mesos_common.h>

#include <b>
#include <d>

That way, the fact that "xyz.cpp" logically depends on <b> (but not
<a> or <c>) is not obscured (in other words, Mesos should continue to
compile if 'mesos_common.h' is replaced with an empty file). Does
anyone know whether the header guard in <b> _should_ make the repeated
inclusion of <b> relatively cheap?

Neil

RE: Proposal for Mesos Build Improvements

Posted by Alex Clemmer <cl...@gmail.com>.
I don't think Joris is saying we should do this instead of PCHs, I think
he's just saying that it seems that a sizable performance improvement
can be made just by organizing Stout like a normal, not-header-only
library.

Taking stock briefly, based on what you've said here and elsewhere, it
seems that the main advantages of PCHs are:

1. Should speed up even "totally clean", non-ccached builds.
2. Removes the cost of parsing and compiling header files, and replaces
    it with the cost of accessing the cached version.
3. Works natively cross-platform, with no external tooling needed.

Is this correct? Have I left anything out?

If so, then it seems that the answer to Neil's question is that it
largely depends on how much time is spent parsing and compiling various
headers, no? If Stout were to be reorganized into a non-header-only
library, then we might reduce the cost of parsing, but if it is still
non-trivial, then it seems we would expect PCHs to improve the compile
times anyway.

If that's all right, then it seems difficult to answer Neil's question
directly.


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Wed, 15 Feb 2017, Jeff Coffler wrote:

> Yes, but this should be dramatically sped up with precompiled headers. Yes, scanning the headers takes (a lot) of time. But if you only scan them once due to precompiled headers, it no longer matters.
>
> I don't care if it takes 10 or even 15 or 20 seconds to scan all the headers. If you only do it once, the compile time is sped up dramatically.
>
> If you're not doing PCH work, then I guess it could make sense to simplify the headers to the greatest extent possible. But you can only do that so much; with C++ template use for example, sometimes the implementation MUST be in header files.
>
> /Jeff
>
> -----Original Message-----
> From: Joris Van Remoortere [mailto:joris@mesosphere.io]
> Sent: Wednesday, February 15, 2017 9:46 AM
> To: dev@mesos.apache.org
> Subject: Re: Proposal for Mesos Build Improvements
>
>>
>> However, the non-header-only work won't do anything in a "clean build"
>> scenario.
>
> I don't think this is true.
>
> If you look at how many independent .o files we build that scan those headers each time it should be clear that reducing the complexity of the header file reduces the compile time.
> A good example of heave .o files are the mesos tests that scan close to all of stout / libprocess for each test file.
>
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Feb 14, 2017 at 4:49 PM, Jeff Coffler < Jeff.Coffler@microsoft.com.invalid> wrote:
>
>> Hi Neil,
>>
>> This was discussed in the CXX Mesos Slack channel yesterday.
>>
>> Basically, the two are separate and independent. Regardless of stout
>> work, I anticipate that PCH work will dramatically speed up the
>> Windows build (and Linux too, although I have less experience in that
>> area). I'm going to run some benchmarks on a subset of the code to give a good "before/after"
>> idea of the speedup and report to the list.
>>
>> If stout non-header-only library work is done, this will do a fair
>> amount to speed up incremental builds (i.e. you just update
>> implementation of a stout method, and only the related C file is
>> rebuilt). However, the non-header-only work won't do anything in a
>> "clean build" scenario. And, if course, if you change the interface of
>> a stout method, all bets are off and you get to rebuild virtually the world.
>>
>> PCH, on the other hand, will speed up all compiles across the board
>> (using stout and not using stout). Now, that said, if a stout change
>> is made (assuming still header-only), you will still rebuild
>> everything, but the builds will go much faster. That *may* be fast
>> enough to take the sting out of significant stout changes, but
>> changing stout will still help the incremental build cases regardless.
>>
>> Hope that clarifies,
>>
>> /Jeff
>>
>> -----Original Message-----
>> From: Neil Conway [mailto:neil.conway@gmail.com]
>> Sent: Tuesday, February 14, 2017 11:45 AM
>> To: dev <de...@mesos.apache.org>
>> Subject: Re: Proposal for Mesos Build Improvements
>>
>> I'm curious to hear more about how using PCH compares with making
>> stout a non-header-only library. Is PCH easier to implement, or is it
>> expected to offer a more dramatic improvement in compile times? Would
>> making both changes eventually make sense?
>>
>> Neil
>>
>> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
>> <Jeff.Coffler@microsoft.com .invalid> wrote:
>>> Proposal For Build Improvements
>>>
>>> The Mesos build process is in dire need of some build infrastructure
>> improvements. These improvements will improve speed and ease of work
>> in particular components, and dramatically improve overall build time,
>> especially in the Windows environment, but likely in the Linux
>> environment as well.
>>>
>>>
>>> Background:
>>>
>>> It is currently recommended to use the ccache project with the Mesos
>> build process. This makes the Linux build process more tolerable in
>> terms of speed, but unfortunately such software is not available on Windows.
>> Ultimately, though, the caching software is covering up two
>> fundamental flaws in the overall build process:
>>>
>>> 1. Lack of use of libraries
>>> 2. Lack of precompiled headers
>>>
>>> By not allowing use of libraries, the overall build process is often
>> much longer, particularly when a lot of work is being done in a
>> particular component. If work is being done in a particular component,
>> only that library need be rebuilt (and then the overall image
>> relinked). Currently, since there is no such modularization, all
>> source files must be considered at build time. Interestingly enough,
>> there is such modularization in the source code layout; that
>> modularization just isn't utilized at the compiler level.
>>>
>>> Precompiled headers exist on both Windows and Linux. For Linux, you
>>> can
>> refer to https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.htm
>> l&
>> data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d4
>> 5512 0381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
>> 7C636226983234972044&sdata=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8J
>> Uc% 3D&reserved=0. Straight from the GNU CC documentation: "The time
>> the compiler takes to process these header files over and over again
>> can account for nearly all of the time required to build the project."
>>>
>>> In my prior use of precompiled headers, each C or C++ file generally
>> took about 4 seconds to compile. After switching to precompiled
>> headers, the precompiled header creation took about 4 seconds, but
>> each C/C++ file now took about 200 milliseconds to compile. The
>> overall build speed was thus dramatically reduced.
>>>
>>>
>>> Scope of Changes:
>>>
>>> These changes are only being proposed for the CMake system. Going
>> forward, the CMake system is the easiest way to maintain some level of
>> portability between the Linux and Windows platforms.
>>>
>>>
>>> Details for Modularization:
>>>
>>> For the modularization, the intent is to simply make each source
>> directory of files, if functionally separate, to be compiled into an
>> archive (.a) file. These archive files will then be linked together to
>> form the actual executables. These changes will primarily be in the
>> CMake system, and should have limited effect on any actual source code.
>>>
>>> At a later date, if it makes sense, we can look at building shared
>> library (.so) files. However, this only makes the most sense if the
>> code is truly shared between different executable files. If that's not
>> the case, then it likely makes sense just to stick with .a files.
>> Regardless, generation of .so files is out of scope for this change.
>>>
>>>
>>> Details for Precompiled Header Changes:
>>>
>>> Precompiled headers will make use of stout (a very large header-only
>> library) essentially "free" from a compile-time overhead point of view.
>> Basically, precompiled headers will take a list of header files
>> (including very long header files, like "windows.h"), and generate the
>> compiler memory structures for their representation.
>>>
>>> During precompiled header generation, these memory structures are
>> flushed to disk. Then, when components are built, the memory
>> structures are reloaded from disk, which is dramatically faster than
>> actually parsing the tens of thousands of lines of header files and
>> building the memory structures.
>>>
>>> For precompiled headers to be useful, a relatively "consistent" set
>>> of
>> headers must be included by all of the C/C++ files. So, for example,
>> consider the following C file:
>>>
>>> #if defined(windows)
>>> #include <windows.h>
>>> #endif
>>>
>>> #include <header-a>
>>> #include <header-b>
>>> #include <header-c>
>>>
>>> < - Remainder of module - >
>>>
>>> To make a precompiled header for this module, all of the #include
>>> files
>> would be included in a new file, mesos_common.h. The C file would then
>> be changed as follows:
>>>
>>> #include "mesos_common.h"
>>>
>>> < - Remainder of module - >
>>>
>>> Structurally, the code is identical, and need not be built with
>> precompiled headers. However, use of precompiled headers will make
>> file compilation dramatically faster.
>>>
>>> Note that other include files can be included after the precompiled
>> header if appropriate. For example, the following is valid:
>>>
>>> #include "mesos_common.h"
>>> #inclue <header-d>
>>>
>>> < - Remainder of module - >
>>>
>>> For efficiency purposes, if a header file is included by 50% or more
>>> of
>> the source files, it should be included in the precompiled header. If
>> a header is included in fewer than 50% of the source files, then it
>> can be separately included (and thus would not benefit from precompiled headers).
>> Note that this is a guideline; even if a header is used by less than
>> 50% of source files, if it's very large, we still may decide to throw
>> it in the precompiled header.
>>>
>>> Note that, for use of precompiled headers, there will be a great
>>> deal of code churn (almost exclusively in the #include list of
>>> source files). This will mean that there will be a lot of code
>>> merges, but ultimately no "code logic" changes. If merges are not
>>> done in a timely fashion, this can easily result in needless hand merging of changes.
>>> Due to these issues, we will need a dedicated sheppard that will
>>> integrate the patches quickly. This kind of work is easily
>>> invalidated when the include list is changed by another developer,
>>> necessitating us to redo the patch. [Note that Joseph has stepped up
>>> to the plate for this, thanks Joseph!]
>>>
>>>
>>> This is the end of my proposal, feedback would be appreciated.
>>
>

RE: Proposal for Mesos Build Improvements

Posted by Jeff Coffler <Je...@microsoft.com.INVALID>.
Yes, but this should be dramatically sped up with precompiled headers. Yes, scanning the headers takes (a lot) of time. But if you only scan them once due to precompiled headers, it no longer matters.

I don't care if it takes 10 or even 15 or 20 seconds to scan all the headers. If you only do it once, the compile time is sped up dramatically.

If you're not doing PCH work, then I guess it could make sense to simplify the headers to the greatest extent possible. But you can only do that so much; with C++ template use for example, sometimes the implementation MUST be in header files.

/Jeff

-----Original Message-----
From: Joris Van Remoortere [mailto:joris@mesosphere.io] 
Sent: Wednesday, February 15, 2017 9:46 AM
To: dev@mesos.apache.org
Subject: Re: Proposal for Mesos Build Improvements

>
> However, the non-header-only work won't do anything in a "clean build"
> scenario.

I don't think this is true.

If you look at how many independent .o files we build that scan those headers each time it should be clear that reducing the complexity of the header file reduces the compile time.
A good example of heave .o files are the mesos tests that scan close to all of stout / libprocess for each test file.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Feb 14, 2017 at 4:49 PM, Jeff Coffler < Jeff.Coffler@microsoft.com.invalid> wrote:

> Hi Neil,
>
> This was discussed in the CXX Mesos Slack channel yesterday.
>
> Basically, the two are separate and independent. Regardless of stout 
> work, I anticipate that PCH work will dramatically speed up the 
> Windows build (and Linux too, although I have less experience in that 
> area). I'm going to run some benchmarks on a subset of the code to give a good "before/after"
> idea of the speedup and report to the list.
>
> If stout non-header-only library work is done, this will do a fair 
> amount to speed up incremental builds (i.e. you just update 
> implementation of a stout method, and only the related C file is 
> rebuilt). However, the non-header-only work won't do anything in a 
> "clean build" scenario. And, if course, if you change the interface of 
> a stout method, all bets are off and you get to rebuild virtually the world.
>
> PCH, on the other hand, will speed up all compiles across the board 
> (using stout and not using stout). Now, that said, if a stout change 
> is made (assuming still header-only), you will still rebuild 
> everything, but the builds will go much faster. That *may* be fast 
> enough to take the sting out of significant stout changes, but 
> changing stout will still help the incremental build cases regardless.
>
> Hope that clarifies,
>
> /Jeff
>
> -----Original Message-----
> From: Neil Conway [mailto:neil.conway@gmail.com]
> Sent: Tuesday, February 14, 2017 11:45 AM
> To: dev <de...@mesos.apache.org>
> Subject: Re: Proposal for Mesos Build Improvements
>
> I'm curious to hear more about how using PCH compares with making 
> stout a non-header-only library. Is PCH easier to implement, or is it 
> expected to offer a more dramatic improvement in compile times? Would 
> making both changes eventually make sense?
>
> Neil
>
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler 
> <Jeff.Coffler@microsoft.com .invalid> wrote:
> > Proposal For Build Improvements
> >
> > The Mesos build process is in dire need of some build infrastructure
> improvements. These improvements will improve speed and ease of work 
> in particular components, and dramatically improve overall build time, 
> especially in the Windows environment, but likely in the Linux 
> environment as well.
> >
> >
> > Background:
> >
> > It is currently recommended to use the ccache project with the Mesos
> build process. This makes the Linux build process more tolerable in 
> terms of speed, but unfortunately such software is not available on Windows.
> Ultimately, though, the caching software is covering up two 
> fundamental flaws in the overall build process:
> >
> > 1. Lack of use of libraries
> > 2. Lack of precompiled headers
> >
> > By not allowing use of libraries, the overall build process is often
> much longer, particularly when a lot of work is being done in a 
> particular component. If work is being done in a particular component, 
> only that library need be rebuilt (and then the overall image 
> relinked). Currently, since there is no such modularization, all 
> source files must be considered at build time. Interestingly enough, 
> there is such modularization in the source code layout; that 
> modularization just isn't utilized at the compiler level.
> >
> > Precompiled headers exist on both Windows and Linux. For Linux, you 
> > can
> refer to https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.htm
> l&
> data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d4
> 5512 0381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636226983234972044&sdata=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8J
> Uc% 3D&reserved=0. Straight from the GNU CC documentation: "The time 
> the compiler takes to process these header files over and over again 
> can account for nearly all of the time required to build the project."
> >
> > In my prior use of precompiled headers, each C or C++ file generally
> took about 4 seconds to compile. After switching to precompiled 
> headers, the precompiled header creation took about 4 seconds, but 
> each C/C++ file now took about 200 milliseconds to compile. The 
> overall build speed was thus dramatically reduced.
> >
> >
> > Scope of Changes:
> >
> > These changes are only being proposed for the CMake system. Going
> forward, the CMake system is the easiest way to maintain some level of 
> portability between the Linux and Windows platforms.
> >
> >
> > Details for Modularization:
> >
> > For the modularization, the intent is to simply make each source
> directory of files, if functionally separate, to be compiled into an 
> archive (.a) file. These archive files will then be linked together to 
> form the actual executables. These changes will primarily be in the 
> CMake system, and should have limited effect on any actual source code.
> >
> > At a later date, if it makes sense, we can look at building shared
> library (.so) files. However, this only makes the most sense if the 
> code is truly shared between different executable files. If that's not 
> the case, then it likely makes sense just to stick with .a files. 
> Regardless, generation of .so files is out of scope for this change.
> >
> >
> > Details for Precompiled Header Changes:
> >
> > Precompiled headers will make use of stout (a very large header-only
> library) essentially "free" from a compile-time overhead point of view.
> Basically, precompiled headers will take a list of header files 
> (including very long header files, like "windows.h"), and generate the 
> compiler memory structures for their representation.
> >
> > During precompiled header generation, these memory structures are
> flushed to disk. Then, when components are built, the memory 
> structures are reloaded from disk, which is dramatically faster than 
> actually parsing the tens of thousands of lines of header files and 
> building the memory structures.
> >
> > For precompiled headers to be useful, a relatively "consistent" set 
> > of
> headers must be included by all of the C/C++ files. So, for example, 
> consider the following C file:
> >
> > #if defined(windows)
> > #include <windows.h>
> > #endif
> >
> > #include <header-a>
> > #include <header-b>
> > #include <header-c>
> >
> > < - Remainder of module - >
> >
> > To make a precompiled header for this module, all of the #include 
> > files
> would be included in a new file, mesos_common.h. The C file would then 
> be changed as follows:
> >
> > #include "mesos_common.h"
> >
> > < - Remainder of module - >
> >
> > Structurally, the code is identical, and need not be built with
> precompiled headers. However, use of precompiled headers will make 
> file compilation dramatically faster.
> >
> > Note that other include files can be included after the precompiled
> header if appropriate. For example, the following is valid:
> >
> > #include "mesos_common.h"
> > #inclue <header-d>
> >
> > < - Remainder of module - >
> >
> > For efficiency purposes, if a header file is included by 50% or more 
> > of
> the source files, it should be included in the precompiled header. If 
> a header is included in fewer than 50% of the source files, then it 
> can be separately included (and thus would not benefit from precompiled headers).
> Note that this is a guideline; even if a header is used by less than 
> 50% of source files, if it's very large, we still may decide to throw 
> it in the precompiled header.
> >
> > Note that, for use of precompiled headers, there will be a great 
> > deal of code churn (almost exclusively in the #include list of 
> > source files). This will mean that there will be a lot of code 
> > merges, but ultimately no "code logic" changes. If merges are not 
> > done in a timely fashion, this can easily result in needless hand merging of changes.
> > Due to these issues, we will need a dedicated sheppard that will 
> > integrate the patches quickly. This kind of work is easily 
> > invalidated when the include list is changed by another developer, 
> > necessitating us to redo the patch. [Note that Joseph has stepped up 
> > to the plate for this, thanks Joseph!]
> >
> >
> > This is the end of my proposal, feedback would be appreciated.
>

Re: Proposal for Mesos Build Improvements

Posted by Joris Van Remoortere <jo...@mesosphere.io>.
>
> However, the non-header-only work won't do anything in a "clean build"
> scenario.

I don't think this is true.

If you look at how many independent .o files we build that scan those
headers each time it should be clear that reducing the complexity of the
header file reduces the compile time.
A good example of heave .o files are the mesos tests that scan close to all
of stout / libprocess for each test file.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Feb 14, 2017 at 4:49 PM, Jeff Coffler <
Jeff.Coffler@microsoft.com.invalid> wrote:

> Hi Neil,
>
> This was discussed in the CXX Mesos Slack channel yesterday.
>
> Basically, the two are separate and independent. Regardless of stout work,
> I anticipate that PCH work will dramatically speed up the Windows build
> (and Linux too, although I have less experience in that area). I'm going to
> run some benchmarks on a subset of the code to give a good "before/after"
> idea of the speedup and report to the list.
>
> If stout non-header-only library work is done, this will do a fair amount
> to speed up incremental builds (i.e. you just update implementation of a
> stout method, and only the related C file is rebuilt). However, the
> non-header-only work won't do anything in a "clean build" scenario. And, if
> course, if you change the interface of a stout method, all bets are off and
> you get to rebuild virtually the world.
>
> PCH, on the other hand, will speed up all compiles across the board (using
> stout and not using stout). Now, that said, if a stout change is made
> (assuming still header-only), you will still rebuild everything, but the
> builds will go much faster. That *may* be fast enough to take the sting out
> of significant stout changes, but changing stout will still help the
> incremental build cases regardless.
>
> Hope that clarifies,
>
> /Jeff
>
> -----Original Message-----
> From: Neil Conway [mailto:neil.conway@gmail.com]
> Sent: Tuesday, February 14, 2017 11:45 AM
> To: dev <de...@mesos.apache.org>
> Subject: Re: Proposal for Mesos Build Improvements
>
> I'm curious to hear more about how using PCH compares with making stout a
> non-header-only library. Is PCH easier to implement, or is it expected to
> offer a more dramatic improvement in compile times? Would making both
> changes eventually make sense?
>
> Neil
>
> On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <Jeff.Coffler@microsoft.com
> .invalid> wrote:
> > Proposal For Build Improvements
> >
> > The Mesos build process is in dire need of some build infrastructure
> improvements. These improvements will improve speed and ease of work in
> particular components, and dramatically improve overall build time,
> especially in the Windows environment, but likely in the Linux environment
> as well.
> >
> >
> > Background:
> >
> > It is currently recommended to use the ccache project with the Mesos
> build process. This makes the Linux build process more tolerable in terms
> of speed, but unfortunately such software is not available on Windows.
> Ultimately, though, the caching software is covering up two fundamental
> flaws in the overall build process:
> >
> > 1. Lack of use of libraries
> > 2. Lack of precompiled headers
> >
> > By not allowing use of libraries, the overall build process is often
> much longer, particularly when a lot of work is being done in a particular
> component. If work is being done in a particular component, only that
> library need be rebuilt (and then the overall image relinked). Currently,
> since there is no such modularization, all source files must be considered
> at build time. Interestingly enough, there is such modularization in the
> source code layout; that modularization just isn't utilized at the compiler
> level.
> >
> > Precompiled headers exist on both Windows and Linux. For Linux, you can
> refer to https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.html&
> data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d45512
> 0381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636226983234972044&sdata=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8JUc%
> 3D&reserved=0. Straight from the GNU CC documentation: "The time the
> compiler takes to process these header files over and over again can
> account for nearly all of the time required to build the project."
> >
> > In my prior use of precompiled headers, each C or C++ file generally
> took about 4 seconds to compile. After switching to precompiled headers,
> the precompiled header creation took about 4 seconds, but each C/C++ file
> now took about 200 milliseconds to compile. The overall build speed was
> thus dramatically reduced.
> >
> >
> > Scope of Changes:
> >
> > These changes are only being proposed for the CMake system. Going
> forward, the CMake system is the easiest way to maintain some level of
> portability between the Linux and Windows platforms.
> >
> >
> > Details for Modularization:
> >
> > For the modularization, the intent is to simply make each source
> directory of files, if functionally separate, to be compiled into an
> archive (.a) file. These archive files will then be linked together to form
> the actual executables. These changes will primarily be in the CMake
> system, and should have limited effect on any actual source code.
> >
> > At a later date, if it makes sense, we can look at building shared
> library (.so) files. However, this only makes the most sense if the code is
> truly shared between different executable files. If that's not the case,
> then it likely makes sense just to stick with .a files. Regardless,
> generation of .so files is out of scope for this change.
> >
> >
> > Details for Precompiled Header Changes:
> >
> > Precompiled headers will make use of stout (a very large header-only
> library) essentially "free" from a compile-time overhead point of view.
> Basically, precompiled headers will take a list of header files (including
> very long header files, like "windows.h"), and generate the compiler memory
> structures for their representation.
> >
> > During precompiled header generation, these memory structures are
> flushed to disk. Then, when components are built, the memory structures are
> reloaded from disk, which is dramatically faster than actually parsing the
> tens of thousands of lines of header files and building the memory
> structures.
> >
> > For precompiled headers to be useful, a relatively "consistent" set of
> headers must be included by all of the C/C++ files. So, for example,
> consider the following C file:
> >
> > #if defined(windows)
> > #include <windows.h>
> > #endif
> >
> > #include <header-a>
> > #include <header-b>
> > #include <header-c>
> >
> > < - Remainder of module - >
> >
> > To make a precompiled header for this module, all of the #include files
> would be included in a new file, mesos_common.h. The C file would then be
> changed as follows:
> >
> > #include "mesos_common.h"
> >
> > < - Remainder of module - >
> >
> > Structurally, the code is identical, and need not be built with
> precompiled headers. However, use of precompiled headers will make file
> compilation dramatically faster.
> >
> > Note that other include files can be included after the precompiled
> header if appropriate. For example, the following is valid:
> >
> > #include "mesos_common.h"
> > #inclue <header-d>
> >
> > < - Remainder of module - >
> >
> > For efficiency purposes, if a header file is included by 50% or more of
> the source files, it should be included in the precompiled header. If a
> header is included in fewer than 50% of the source files, then it can be
> separately included (and thus would not benefit from precompiled headers).
> Note that this is a guideline; even if a header is used by less than 50% of
> source files, if it's very large, we still may decide to throw it in the
> precompiled header.
> >
> > Note that, for use of precompiled headers, there will be a great deal
> > of code churn (almost exclusively in the #include list of source
> > files). This will mean that there will be a lot of code merges, but
> > ultimately no "code logic" changes. If merges are not done in a timely
> > fashion, this can easily result in needless hand merging of changes.
> > Due to these issues, we will need a dedicated sheppard that will
> > integrate the patches quickly. This kind of work is easily invalidated
> > when the include list is changed by another developer, necessitating
> > us to redo the patch. [Note that Joseph has stepped up to the plate
> > for this, thanks Joseph!]
> >
> >
> > This is the end of my proposal, feedback would be appreciated.
>

RE: Proposal for Mesos Build Improvements

Posted by Jeff Coffler <Je...@microsoft.com.INVALID>.
Hi Neil,

This was discussed in the CXX Mesos Slack channel yesterday.

Basically, the two are separate and independent. Regardless of stout work, I anticipate that PCH work will dramatically speed up the Windows build (and Linux too, although I have less experience in that area). I'm going to run some benchmarks on a subset of the code to give a good "before/after" idea of the speedup and report to the list.

If stout non-header-only library work is done, this will do a fair amount to speed up incremental builds (i.e. you just update implementation of a stout method, and only the related C file is rebuilt). However, the non-header-only work won't do anything in a "clean build" scenario. And, if course, if you change the interface of a stout method, all bets are off and you get to rebuild virtually the world.

PCH, on the other hand, will speed up all compiles across the board (using stout and not using stout). Now, that said, if a stout change is made (assuming still header-only), you will still rebuild everything, but the builds will go much faster. That *may* be fast enough to take the sting out of significant stout changes, but changing stout will still help the incremental build cases regardless.

Hope that clarifies,

/Jeff

-----Original Message-----
From: Neil Conway [mailto:neil.conway@gmail.com] 
Sent: Tuesday, February 14, 2017 11:45 AM
To: dev <de...@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

I'm curious to hear more about how using PCH compares with making stout a non-header-only library. Is PCH easier to implement, or is it expected to offer a more dramatic improvement in compile times? Would making both changes eventually make sense?

Neil

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler <Je...@microsoft.com.invalid> wrote:
> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure improvements. These improvements will improve speed and ease of work in particular components, and dramatically improve overall build time, especially in the Windows environment, but likely in the Linux environment as well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build process. This makes the Linux build process more tolerable in terms of speed, but unfortunately such software is not available on Windows. Ultimately, though, the caching software is covering up two fundamental flaws in the overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much longer, particularly when a lot of work is being done in a particular component. If work is being done in a particular component, only that library need be rebuilt (and then the overall image relinked). Currently, since there is no such modularization, all source files must be considered at build time. Interestingly enough, there is such modularization in the source code layout; that modularization just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer to https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.html&data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d455120381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636226983234972044&sdata=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8JUc%3D&reserved=0. Straight from the GNU CC documentation: "The time the compiler takes to process these header files over and over again can account for nearly all of the time required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took about 4 seconds to compile. After switching to precompiled headers, the precompiled header creation took about 4 seconds, but each C/C++ file now took about 200 milliseconds to compile. The overall build speed was thus dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, the CMake system is the easiest way to maintain some level of portability between the Linux and Windows platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of files, if functionally separate, to be compiled into an archive (.a) file. These archive files will then be linked together to form the actual executables. These changes will primarily be in the CMake system, and should have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library (.so) files. However, this only makes the most sense if the code is truly shared between different executable files. If that's not the case, then it likely makes sense just to stick with .a files. Regardless, generation of .so files is out of scope for this change.
>
>
> Details for Precompiled Header Changes:
>
> Precompiled headers will make use of stout (a very large header-only library) essentially "free" from a compile-time overhead point of view. Basically, precompiled headers will take a list of header files (including very long header files, like "windows.h"), and generate the compiler memory structures for their representation.
>
> During precompiled header generation, these memory structures are flushed to disk. Then, when components are built, the memory structures are reloaded from disk, which is dramatically faster than actually parsing the tens of thousands of lines of header files and building the memory structures.
>
> For precompiled headers to be useful, a relatively "consistent" set of headers must be included by all of the C/C++ files. So, for example, consider the following C file:
>
> #if defined(windows)
> #include <windows.h>
> #endif
>
> #include <header-a>
> #include <header-b>
> #include <header-c>
>
> < - Remainder of module - >
>
> To make a precompiled header for this module, all of the #include files would be included in a new file, mesos_common.h. The C file would then be changed as follows:
>
> #include "mesos_common.h"
>
> < - Remainder of module - >
>
> Structurally, the code is identical, and need not be built with precompiled headers. However, use of precompiled headers will make file compilation dramatically faster.
>
> Note that other include files can be included after the precompiled header if appropriate. For example, the following is valid:
>
> #include "mesos_common.h"
> #inclue <header-d>
>
> < - Remainder of module - >
>
> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> Note that, for use of precompiled headers, there will be a great deal 
> of code churn (almost exclusively in the #include list of source 
> files). This will mean that there will be a lot of code merges, but 
> ultimately no "code logic" changes. If merges are not done in a timely 
> fashion, this can easily result in needless hand merging of changes. 
> Due to these issues, we will need a dedicated sheppard that will 
> integrate the patches quickly. This kind of work is easily invalidated 
> when the include list is changed by another developer, necessitating 
> us to redo the patch. [Note that Joseph has stepped up to the plate 
> for this, thanks Joseph!]
>
>
> This is the end of my proposal, feedback would be appreciated.

Re: Proposal for Mesos Build Improvements

Posted by Neil Conway <ne...@gmail.com>.
I'm curious to hear more about how using PCH compares with making
stout a non-header-only library. Is PCH easier to implement, or is it
expected to offer a more dramatic improvement in compile times? Would
making both changes eventually make sense?

Neil

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
<Je...@microsoft.com.invalid> wrote:
> Proposal For Build Improvements
>
> The Mesos build process is in dire need of some build infrastructure improvements. These improvements will improve speed and ease of work in particular components, and dramatically improve overall build time, especially in the Windows environment, but likely in the Linux environment as well.
>
>
> Background:
>
> It is currently recommended to use the ccache project with the Mesos build process. This makes the Linux build process more tolerable in terms of speed, but unfortunately such software is not available on Windows. Ultimately, though, the caching software is covering up two fundamental flaws in the overall build process:
>
> 1. Lack of use of libraries
> 2. Lack of precompiled headers
>
> By not allowing use of libraries, the overall build process is often much longer, particularly when a lot of work is being done in a particular component. If work is being done in a particular component, only that library need be rebuilt (and then the overall image relinked). Currently, since there is no such modularization, all source files must be considered at build time. Interestingly enough, there is such modularization in the source code layout; that modularization just isn't utilized at the compiler level.
>
> Precompiled headers exist on both Windows and Linux. For Linux, you can refer to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from the GNU CC documentation: "The time the compiler takes to process these header files over and over again can account for nearly all of the time required to build the project."
>
> In my prior use of precompiled headers, each C or C++ file generally took about 4 seconds to compile. After switching to precompiled headers, the precompiled header creation took about 4 seconds, but each C/C++ file now took about 200 milliseconds to compile. The overall build speed was thus dramatically reduced.
>
>
> Scope of Changes:
>
> These changes are only being proposed for the CMake system. Going forward, the CMake system is the easiest way to maintain some level of portability between the Linux and Windows platforms.
>
>
> Details for Modularization:
>
> For the modularization, the intent is to simply make each source directory of files, if functionally separate, to be compiled into an archive (.a) file. These archive files will then be linked together to form the actual executables. These changes will primarily be in the CMake system, and should have limited effect on any actual source code.
>
> At a later date, if it makes sense, we can look at building shared library (.so) files. However, this only makes the most sense if the code is truly shared between different executable files. If that's not the case, then it likely makes sense just to stick with .a files. Regardless, generation of .so files is out of scope for this change.
>
>
> Details for Precompiled Header Changes:
>
> Precompiled headers will make use of stout (a very large header-only library) essentially "free" from a compile-time overhead point of view. Basically, precompiled headers will take a list of header files (including very long header files, like "windows.h"), and generate the compiler memory structures for their representation.
>
> During precompiled header generation, these memory structures are flushed to disk. Then, when components are built, the memory structures are reloaded from disk, which is dramatically faster than actually parsing the tens of thousands of lines of header files and building the memory structures.
>
> For precompiled headers to be useful, a relatively "consistent" set of headers must be included by all of the C/C++ files. So, for example, consider the following C file:
>
> #if defined(windows)
> #include <windows.h>
> #endif
>
> #include <header-a>
> #include <header-b>
> #include <header-c>
>
> < - Remainder of module - >
>
> To make a precompiled header for this module, all of the #include files would be included in a new file, mesos_common.h. The C file would then be changed as follows:
>
> #include "mesos_common.h"
>
> < - Remainder of module - >
>
> Structurally, the code is identical, and need not be built with precompiled headers. However, use of precompiled headers will make file compilation dramatically faster.
>
> Note that other include files can be included after the precompiled header if appropriate. For example, the following is valid:
>
> #include "mesos_common.h"
> #inclue <header-d>
>
> < - Remainder of module - >
>
> For efficiency purposes, if a header file is included by 50% or more of the source files, it should be included in the precompiled header. If a header is included in fewer than 50% of the source files, then it can be separately included (and thus would not benefit from precompiled headers). Note that this is a guideline; even if a header is used by less than 50% of source files, if it's very large, we still may decide to throw it in the precompiled header.
>
> Note that, for use of precompiled headers, there will be a great deal of code churn (almost exclusively in the #include list of source files). This will mean that there will be a lot of code merges, but ultimately no "code logic" changes. If merges are not done in a timely fashion, this can easily result in needless hand merging of changes. Due to these issues, we will need a dedicated sheppard that will integrate the patches quickly. This kind of work is easily invalidated when the include list is changed by another developer, necessitating us to redo the patch. [Note that Joseph has stepped up to the plate for this, thanks Joseph!]
>
>
> This is the end of my proposal, feedback would be appreciated.