You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@groovy.apache.org by Cédric Champeau <cc...@apache.org> on 2017/04/09 09:53:18 UTC

Towards a better compiler

Hi team!

I would like to setup some additional goals for the next release of Groovy.
As you may know, Gradle 3.5, released tomorrow, will ship with a build
cache [1]. But Groovy is causing us some troubles, because the output of
compilation is not reproducible. In other words, for the same inputs, we
can randomly get a different output. To be clear, while the output of
Groovy is semantically correct, we cannot guarantee that for the same
sources, the same bytecode, byte to byte, is going to be generated. This is
a big problem for Gradle, because it affects the cache key, and a cache
miss has terrible consequences: since we need to rebuild everything when a
build script classpath changes, a change in the output of a build script
bytecode means we need to invalidate a full build...

As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
enough. We should revise our usage of maps and sets, and use their linked
counterparts when it makes sense. It, alone, cannot guarantee that we have
reproducible builds. In particular, cross platform. There are hundreds of
places where we use hash sets/maps, with objects that do not implement
equals/hashcode, for example (and since those structures are mutable, we're
very lucky because the default hashcode uses the system one, which is
immutable).

Eventually, if you have read my blog post about performance of Gradle 3.4
[4], you would understand that the Java compiler does a much better job
than we do as separating things required by the compiler from things
required in user space. In particular, we at Groovy offer AST
transformations, which is similar, but not equivalent, to what annotation
processors are for javac. The problem is that Groovy only has a single
compile classpath, which includes both the "compiler plugins" (AST
transformations) and implementation dependencies of the project we compile.
So we're effectively mixing things that shouldn't be mixed:

- annotations for AST transformations should be on the compile classpath
- implementation of the AST transformations should be on the AST
transformations path (compiler classpath)

If we don't do this, we cannot be as smart as what we do in the Java world,
and compute what is relevant in terms of ABI (application binary
interface). So any change to the classpath, needs to be considered a
breaking change and we need to recompile everything. This makes the
implementation of a Groovy incremental compiler effectively impossible.
Furthermore, it prevents the AST transform implementors to use the
libraries they want as dependencies, without leaking them to the user
compile classpath (imagine an AST xform which uses Guava 18, while the user
wants Guava 17 on his classpath). In practice, the implementation
dependencies of the AST xform are only required at compile time, not
runtime, so they should be separate.

In short, I vote for the adoption of a new `-compilerpath` where we would
put everything that affects the compiler itself, and live `--classpath` for
the user space. Doing this would let tools like Gradle be much smarter
about what they can do.

[1] https://docs.gradle.org/3.5-rc-3/release-notes.html
[2] https://issues.apache.org/jira/browse/GROOVY-8142
[3] https://issues.apache.org/jira/browse/GROOVY-8148
[4] https://blog.gradle.org/incremental-compiler-avoidance

Re: Towards a better compiler

Posted by Jochen Theodorou <bl...@gmx.org>.
On 09.04.2017 11:53, C�dric Champeau wrote:
[...]
> There
> are hundreds of places where we use hash sets/maps, with objects that do
> not implement equals/hashcode, for example (and since those structures
> are mutable, we're very lucky because the default hashcode uses the
> system one, which is immutable).

if we use a set without equals and hashcode, then I guess a list would 
be better. For the maps it really depends.

>[...] In particular, we at Groovy offer AST
> transformations, which is similar, but not equivalent, to what
> annotation processors are for javac. The problem is that Groovy only has
> a single compile classpath, which includes both the "compiler plugins"
> (AST transformations) and implementation dependencies of the project we
> compile. So we're effectively mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)

internally this is separated with the transform loader. Just made no 
sense so far to extend that to FileSystemCompiler. Is Gradle going to 
use FileSystemcompiler now? Because before it did afaik not and could 
have had this already. This is also the reason

> If we don't do this, we cannot be as smart as what we do in the Java
> world, and compute what is relevant in terms of ABI (application binary
> interface). So any change to the classpath, needs to be considered a
> breaking change and we need to recompile everything. This makes the
> implementation of a Groovy incremental compiler effectively impossible.

That means in an incremental compilation scenario the classpath changes? 
The source set I expect to change, yes, but why the compile classpath? 
Would be nice if you could describe this a bit in more detail.

> Furthermore, it prevents the AST transform implementors to use the
> libraries they want as dependencies, without leaking them to the user
> compile classpath (imagine an AST xform which uses Guava 18, while the
> user wants Guava 17 on his classpath). In practice, the implementation
> dependencies of the AST xform are only required at compile time, not
> runtime, so they should be separate.

here I can agree again.. the main problem I think is the groovy runtime 
itself. Is a Closure on the compile classpath identical with one on the 
compiler classpath? Does it make sense to have a transform let�s say use 
Groovy 2.0 for the implementation on a Groovy 2.5 source?

> In short, I vote for the adoption of a new `-compilerpath` where we
> would put everything that affects the compiler itself, and live
> `--classpath` for the user space. Doing this would let tools like Gradle
> be much smarter about what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance

sure... the identity question determines the complexity of the change 
though. And how Gradle would use it as well of course

bye Jochen


Re: Towards a better compiler

Posted by Jochen Theodorou <bl...@gmx.org>.
On 21.04.2017 14:54, Jesper Steen M�ller wrote:
[....]
> Also: How about the timestamps embedded into classes' initialization
> logic, such as  12: putstatic     #216                // Field
> __timeStamp__239_neverHappen1490860918788:J Are those still around?
> Won't they also prevent repeatable builds?

the command line compiler should no longer produce classes with those. 
only if you compile at runtime and with recompilation enabled they are added

bye Jochen


Re: Towards a better compiler

Posted by Jesper Steen Møller <je...@selskabet.org>.
Hi list

> On 21 Apr 2017, at 14.33, Andres Almiray <aa...@gmail.com> wrote:
> 
> I had a brief chat with Jochen a few days ago regarding this topic.
> 
> Until now the usage of a Set or some other data structure was not really that important.
> If Groovy switches to a SortedSet then results may be more predictable, but, is it in the benefit of the majority?

For a reproducible build, LinkedHashSet would make more sense (as it preservers order, rather than sorting), and doesn't rely on comparability. For subtle dependencies like order of parent classes, this would be preferable.

Fixing the ordering can't possibly be worse than having it be random/unstable.

> What about opening the door for certain strategies to be injected from the outside, thus Gradle may inject certain customizations during the compiler process that make sense to the build tool but not for a Groovy developer compiling with the default Groovy compiler settings. This would require looking for places where custom strategies may be required (such as the particular collection to keep track of names discussed earlier), perhaps relying on ServiceLoader or some other mechanism to discover custom strategies or pick the default ones during compiler bootstrap.
> 
That sounds like a heavyweight solution compared to using LinkedHashSet but I might not grasp the full problem.

Also: How about the timestamps embedded into classes' initialization logic, such as  12: putstatic     #216                // Field __timeStamp__239_neverHappen1490860918788:J Are those still around? Won't they also prevent repeatable builds?

-Jesper


> -------------------------------------------
> Java Champion; Groovy Enthusiast
> http://andresalmiray.com <http://andresalmiray.com/>
> http://www.linkedin.com/in/aalmiray <http://www.linkedin.com/in/aalmiray>
> --
> What goes up, must come down. Ask any system administrator.
> There are 10 types of people in the world: Those who understand binary, and those who don't.
> To understand recursion, we must first understand recursion.
> 
> On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <graeme.rocher@gmail.com <ma...@gmail.com>> wrote:
> Big +1 one for making the compiler reproducible.
> 
> I think the usage of Sets and HashMap has an impact on Groovydoc too
> because Groovydoc generates different output each time it is run.
> 
> For example the "extends" and "implements" output in Groovydoc changes
> the order of the classes each time it is run. This must be down to
> internal use of unordered sets or hash maps.
> 
> Regarding classpath (compiler vs compile classpath), in many cases AST
> transforms reference classes from libraries that are on the "compile"
> classpath. How would you deal with this case? Have them in both
> places?
> 
> Cheers
> 
> On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <cchampeau@apache.org <ma...@apache.org>> wrote:
> > Hi team!
> >
> > I would like to setup some additional goals for the next release of Groovy.
> > As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> > [1]. But Groovy is causing us some troubles, because the output of
> > compilation is not reproducible. In other words, for the same inputs, we can
> > randomly get a different output. To be clear, while the output of Groovy is
> > semantically correct, we cannot guarantee that for the same sources, the
> > same bytecode, byte to byte, is going to be generated. This is a big problem
> > for Gradle, because it affects the cache key, and a cache miss has terrible
> > consequences: since we need to rebuild everything when a build script
> > classpath changes, a change in the output of a build script bytecode means
> > we need to invalidate a full build...
> >
> > As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> > enough. We should revise our usage of maps and sets, and use their linked
> > counterparts when it makes sense. It, alone, cannot guarantee that we have
> > reproducible builds. In particular, cross platform. There are hundreds of
> > places where we use hash sets/maps, with objects that do not implement
> > equals/hashcode, for example (and since those structures are mutable, we're
> > very lucky because the default hashcode uses the system one, which is
> > immutable).
> >
> > Eventually, if you have read my blog post about performance of Gradle 3.4
> > [4], you would understand that the Java compiler does a much better job than
> > we do as separating things required by the compiler from things required in
> > user space. In particular, we at Groovy offer AST transformations, which is
> > similar, but not equivalent, to what annotation processors are for javac.
> > The problem is that Groovy only has a single compile classpath, which
> > includes both the "compiler plugins" (AST transformations) and
> > implementation dependencies of the project we compile. So we're effectively
> > mixing things that shouldn't be mixed:
> >
> > - annotations for AST transformations should be on the compile classpath
> > - implementation of the AST transformations should be on the AST
> > transformations path (compiler classpath)
> >
> > If we don't do this, we cannot be as smart as what we do in the Java world,
> > and compute what is relevant in terms of ABI (application binary interface).
> > So any change to the classpath, needs to be considered a breaking change and
> > we need to recompile everything. This makes the implementation of a Groovy
> > incremental compiler effectively impossible. Furthermore, it prevents the
> > AST transform implementors to use the libraries they want as dependencies,
> > without leaking them to the user compile classpath (imagine an AST xform
> > which uses Guava 18, while the user wants Guava 17 on his classpath). In
> > practice, the implementation dependencies of the AST xform are only required
> > at compile time, not runtime, so they should be separate.
> >
> > In short, I vote for the adoption of a new `-compilerpath` where we would
> > put everything that affects the compiler itself, and live `--classpath` for
> > the user space. Doing this would let tools like Gradle be much smarter about
> > what they can do.
> >
> > [1] https://docs.gradle.org/3.5-rc-3/release-notes.html <https://docs.gradle.org/3.5-rc-3/release-notes.html>
> > [2] https://issues.apache.org/jira/browse/GROOVY-8142 <https://issues.apache.org/jira/browse/GROOVY-8142>
> > [3] https://issues.apache.org/jira/browse/GROOVY-8148 <https://issues.apache.org/jira/browse/GROOVY-8148>
> > [4] https://blog.gradle.org/incremental-compiler-avoidance <https://blog.gradle.org/incremental-compiler-avoidance>
> 
> 
> 
> --
> Graeme Rocher
> 


Re: Towards a better compiler

Posted by Jochen Theodorou <bl...@gmx.org>.

On 24.04.2017 10:56, C�dric Champeau wrote:
>
>
>     Implementing equals/hashcode does not give you any order in a Set or
>     Map you can rely on. Really, I still don�t get what you target with
>     that. You require them in set to avoid duplicates.
>
>
> I'm not talking about order here. I'm talking about reproducibility. A
> linked hash set guarantees that things are read in the order they were
> inserted. This is in general just enough. For example, we read in the
> bytecode a list of interfaces, we add them to a set. If it's not linked,
> then when we read them back, you have no guarantee. It has nothing to do
> with _sorting_, which would reorder interfaces typically. The 2 uses
> cases are different.

So we agree it is the "linked" ability that gives the order and equals 
and hashcode are secondary

>     then don�t. Let�s change the API to not return Set.
>
> Yes. I would say +100 if it wasn't for performance. Typically if the
> algorithm needs to use  `contains`, a `Set` is way more efficient.

I was saying to use List, because there is no equals and hashcode 
defined. If they are not defined your search will be invalid unless, you 
are searching using the exact same instance.

And... what hashcode function do you suggest for MethodNode and 
ClassNode? Because I think they won't be cheap. If they are too 
expensive "way more efficient" will become very different for small sets.

> And
> that's exactly why we use sets in lots of places. So to be clear, I did
> a quick review of where we used sets and maps, and replaced that with
> linked hash set and linked hash map when we needed both to guarantee
> that the order of insertion is preserved (not an absolute order) *and*
> that the algorithm required fast checks. This is all we need, I think.
> But I'm calling for more than just a quick review, we should recheck all
> our usage precisely. And avoid binary breaking changes, of course :)


as I said, maps are probably different, since there we most probably do 
not use the AST instances as key, and if we did, it is probably a bug. 
For the set cases.... well, maybe it would be better to give the 
container better search abilities instead of exposing a special 
collection here. That way we can ensure we have the implementation we need.

bye Jochen

Re: Towards a better compiler

Posted by Cédric Champeau <ce...@gmail.com>.
>
>
>>
> Implementing equals/hashcode does not give you any order in a Set or Map
> you can rely on. Really, I still don´t get what you target with that. You
> require them in set to avoid duplicates.
>

I'm not talking about order here. I'm talking about reproducibility. A
linked hash set guarantees that things are read in the order they were
inserted. This is in general just enough. For example, we read in the
bytecode a list of interfaces, we add them to a set. If it's not linked,
then when we read them back, you have no guarantee. It has nothing to do
with _sorting_, which would reorder interfaces typically. The 2 uses cases
are different.

>
> then don´t. Let´s change the API to not return Set.


Yes. I would say +100 if it wasn't for performance. Typically if the
algorithm needs to use  `contains`, a `Set` is way more efficient. And
that's exactly why we use sets in lots of places. So to be clear, I did a
quick review of where we used sets and maps, and replaced that with linked
hash set and linked hash map when we needed both to guarantee that the
order of insertion is preserved (not an absolute order) *and* that the
algorithm required fast checks. This is all we need, I think. But I'm
calling for more than just a quick review, we should recheck all our usage
precisely. And avoid binary breaking changes, of course :)

>
>
>
>

Re: Towards a better compiler

Posted by Jochen Theodorou <bl...@gmx.org>.
On 23.04.2017 17:08, C�dric Champeau wrote:
[...]
> Any tool that performs bytecode analysis, or computes signatures is
> affected. So if the compiler is not reproducible, since your at the
> bottom of the execution, you affect all tools that are built on top.
> This affects Gradle, but also Grails, IntelliJ (because different
> bytecode forces reindexing), tools which perform bytecode decoration
> (intercepting Foo#foo() is not the same as intercepting Bar#foo()) so it
> also affects correctness.

I don�t 100% agree, but let�s leave it at that.

>     Also I doubt we can be 100% reproducible as long as we use
>     reflection somewhere, simple because reflection is not stable. Only
>     because of the great contribution from intellij to be able read
>     class files directly without reflection in the compiler we have the
>     ability for *most* classes to have a stable result depending on the
>     occurrence in bytecode. But for for example the Groovy classes
>     directly known by the compiler like Closure, this won�t be the case.
>
>     100% stability is not achievable simply by using linked maps and sets.
>
> I agree, but I don't get the point of not trying to do better.

Didn�t want to say that. I would like to have an achievable and specific 
goal.

> It should
> be pretty obvious that for the same sources and same compile classpath,
> we should produce the same output. I don't get why we wouldn't try to
> fix this, even incrementally, as we found bugs. This is better.

because that is no different "towards a better", than what we did 
before. Ok, maybe I misunderstood and the original mail�s intention.

> I agree
> that we cannot do 100% reproducibility today, because we don't implement
> equals/hashcode, which means we're system dependent (and probably JDK
> dependent too).

Implementing equals/hashcode does not give you any order in a Set or Map 
you can rely on. Really, I still don�t get what you target with that. 
You require them in set to avoid duplicates.

So if you give back a Set<ClassNode>, then it is potentially wrong to 
use a Set as long has ClassNode does not implement equals and hashcode 
*or* this Set should actually be a List - unless of course you can 
ensure two supposed to be equal ClassNodes are not used here, but then 
again the Set should be a List. And I also think it is no solution to 
internally use a LinkedHashSet, and externally use a Set. If the order 
is a given and guaranteed element of your API, then it should be 
reflected in the type and LinkedHashSet should be used for the parameter 
or return type.

Of course the same goes for the usage of MethodNode as set element.

And if we talk about Maps, then the same should be said about the Map 
type at least, because I hope we do not use AST elements as keys anywhere.

This means for me that at least the fix you did for getAllInterfaces is 
not sufficient, since this solution still allows for duplicates. And I 
don�t think ClassNode should have an equals and hashcode either, not 
before with the twin role of being a reference to a type or the type itself.

> And implementing those is doing to be hard, because our
> structures are not immutable (so mutating something that is already in
> the set/map is problematic).

then don�t. Let�s change the API to not return Set.

> Anyway, I'm not asking for 100%
> reproducible today. I'm asking for every dev to realize that we have a
> problem that leads to non deterministic behavior, and that everyone
> should try to take that into consideration when writing code, and
> futhermore, tests.

fair enough...only I would add that this should be also taken into 
consideration when doing reviews.

>     once you include reflection here... Those cases you could solve by
>     sorting though.
>
> Yes, we should avoid reflection as much as possible. I wonder if there
> are still lots of places in the compiler where we do this: reflection
> means loading the classes and too often initializing, which is probably
> no good in any case.

We do not initialize the classes if we must not, which is only the case 
for annotations and annotations using constants - if reflection is used 
for them. Reflection is our fallback. For example the 
GroovyClassLoader/GroovyShell/Eval/GroovyScriptEngine/basically any 
runtime compilation logic will in most cases reflection. And you won�t 
have a transform path or compile class path there either. 
GroovyScriptEngine could possibly changed here somewhat... but I don�t 
see that for the more general cases of GCL, shell and eval.

>         Especially, order of interfaces in
>         declaration type matter (and they are reproducible today), We
>         *must not*
>         reorder them, or it would change semantics (typically for traits). I
>         have fixed the bugs we, in the Gradle team, have identified, but my
>         email was there to mention that we should take better care of this,
>         because we do a pretty bad job at checking that the behavior of the
>         compiler is deterministic.
>
>     well, that was no real option for a long time and no target till
>     now. But if you say your fixes are enough for gradle, then your
>     tests should ensure this stays. And then we don�t do a bad job at it
>     anymore, do we?
>
> I have added tests for the cases I fixed, yes. They are not
> cross-platform, though.

well, I do think the fixes are not sufficient, but I think for other 
reasons than you do ;)

>         Regarding AST transformations classpath, I had actually
>         forgotten that
>         Gradle integrates directly with the compiler, so can use the 2
>         different
>         "classpath". But AFAIK, our CLI doesn't. This should be easy to fix.
>
>     easy fix? depends... first of all... what will you gain from that?
>     Since I have no idea where you get an advantage from those things
>     outside a build tool that goes beyond something like make and ant, I
>     fail to see the advantage
>
> There would be an advantage for the Groovy compiler itself: today, if
> you use `groovyc`, you put everything on the compile classpath. We don't
> make the difference between classes that need to be found for
> implementation of the compiled classes, and those which are dependencies
> of AST transformations. The consequence is the same as the one people
> have with annotation processors when on the compile classpath instead of
> the annotation processor path: mixing classpath which have nothing to do
> together. In particular, say the user code requires Guava 19, but an AST
> transformation was built using Guava 17. Then you have a conflict, but
> the user shouldn't care about this, because Guava 17 is an
> implementation dependency of an AST transformation: nothing that should
> prevent them from using the class they wish, because once compiled,
> Guava 17 is not required anymore. It's about better modelling of
> dependencies, which has direct consequences on user pain.

ok, yes, I agree with that one. I still think for simplicity the current 
behaviour has to be the default.

bye Jochen


Re: Towards a better compiler

Posted by Cédric Champeau <ce...@gmail.com>.
2017-04-21 22:02 GMT+02:00 Jochen Theodorou <bl...@gmx.org>:

> On 21.04.2017 19:46, Cédric Champeau wrote:
> [...]
>
>> The fact is that the compiler, for a
>> specific set of sources + compile classpath, should always produce the
>> same output. Especially on the same machine. If it doesn't, then the
>> compiler is not predictable. It's a bug that needs to be fixed. It has
>> nothing to do with Gradle, and *all* users would benefit from this.
>>
>
> I am curious. How does a user exactly benefit from it outside of a build
> tool like gradle?
>

Any tool that performs bytecode analysis, or computes signatures is
affected. So if the compiler is not reproducible, since your at the bottom
of the execution, you affect all tools that are built on top. This affects
Gradle, but also Grails, IntelliJ (because different bytecode forces
reindexing), tools which perform bytecode decoration (intercepting
Foo#foo() is not the same as intercepting Bar#foo()) so it also affects
correctness.

>
> Also I doubt we can be 100% reproducible as long as we use reflection
> somewhere, simple because reflection is not stable. Only because of the
> great contribution from intellij to be able read class files directly
> without reflection in the compiler we have the ability for *most* classes
> to have a stable result depending on the occurrence in bytecode. But for
> for example the Groovy classes directly known by the compiler like Closure,
> this won´t be the case.
>
> 100% stability is not achievable simply by using linked maps and sets.


I agree, but I don't get the point of not trying to do better. It should be
pretty obvious that for the same sources and same compile classpath, we
should produce the same output. I don't get why we wouldn't try to fix
this, even incrementally, as we found bugs. This is better. I agree that we
cannot do 100% reproducibility today, because we don't implement
equals/hashcode, which means we're system dependent (and probably JDK
dependent too). And implementing those is doing to be hard, because our
structures are not immutable (so mutating something that is already in the
set/map is problematic). Anyway, I'm not asking for 100% reproducible
today. I'm asking for every dev to realize that we have a problem that
leads to non deterministic behavior, and that everyone should try to take
that into consideration when writing code, and futhermore, tests.

>
>
> once you include reflection here... Those cases you could solve by sorting
> though.


Yes, we should avoid reflection as much as possible. I wonder if there are
still lots of places in the compiler where we do this: reflection means
loading the classes and too often initializing, which is probably no good
in any case.

>
>
> Especially, order of interfaces in
>> declaration type matter (and they are reproducible today), We *must not*
>> reorder them, or it would change semantics (typically for traits). I
>> have fixed the bugs we, in the Gradle team, have identified, but my
>> email was there to mention that we should take better care of this,
>> because we do a pretty bad job at checking that the behavior of the
>> compiler is deterministic.
>>
>
> well, that was no real option for a long time and no target till now. But
> if you say your fixes are enough for gradle, then your tests should ensure
> this stays. And then we don´t do a bad job at it anymore, do we?


I have added tests for the cases I fixed, yes. They are not cross-platform,
though.

>
>
> Regarding AST transformations classpath, I had actually forgotten that
>> Gradle integrates directly with the compiler, so can use the 2 different
>> "classpath". But AFAIK, our CLI doesn't. This should be easy to fix.
>>
>
> easy fix? depends... first of all... what will you gain from that? Since I
> have no idea where you get an advantage from those things outside a build
> tool that goes beyond something like make and ant, I fail to see the
> advantage
>

There would be an advantage for the Groovy compiler itself: today, if you
use `groovyc`, you put everything on the compile classpath. We don't make
the difference between classes that need to be found for implementation of
the compiled classes, and those which are dependencies of AST
transformations. The consequence is the same as the one people have with
annotation processors when on the compile classpath instead of the
annotation processor path: mixing classpath which have nothing to do
together. In particular, say the user code requires Guava 19, but an AST
transformation was built using Guava 17. Then you have a conflict, but the
user shouldn't care about this, because Guava 17 is an implementation
dependency of an AST transformation: nothing that should prevent them from
using the class they wish, because once compiled, Guava 17 is not required
anymore. It's about better modelling of dependencies, which has direct
consequences on user pain.

>
> Granted, something like groovydoc gets an advantage, but honestly I do not
> see why there it should be done without sorting to achieve a somewhat
> stable output. And of course that sorting can very well happen in the doc
> tool
>
> bye Jochen
>

Re: Towards a better compiler

Posted by Jochen Theodorou <bl...@gmx.org>.
On 21.04.2017 19:46, C�dric Champeau wrote:
[...]
> The fact is that the compiler, for a
> specific set of sources + compile classpath, should always produce the
> same output. Especially on the same machine. If it doesn't, then the
> compiler is not predictable. It's a bug that needs to be fixed. It has
> nothing to do with Gradle, and *all* users would benefit from this.

I am curious. How does a user exactly benefit from it outside of a build 
tool like gradle?

Also I doubt we can be 100% reproducible as long as we use reflection 
somewhere, simple because reflection is not stable. Only because of the 
great contribution from intellij to be able read class files directly 
without reflection in the compiler we have the ability for *most* 
classes to have a stable result depending on the occurrence in bytecode. 
But for for example the Groovy classes directly known by the compiler 
like Closure, this won�t be the case.

100% stability is not achievable simply by using linked maps and sets.

> Gradle is an important use case because it currently defeats our
> caching, but it's not the only one. IDE indexing is another example.
> It's a serious issue, and we need to consider this a serious bug.

indexing? you mean idea? It does extensive indexing all the time... for 
minutes... without groovy

> BTW, we don't need to _sort_. We need to make sure that for the same
> input, we have the same output.

once you include reflection here... Those cases you could solve by 
sorting though.

> Especially, order of interfaces in
> declaration type matter (and they are reproducible today), We *must not*
> reorder them, or it would change semantics (typically for traits). I
> have fixed the bugs we, in the Gradle team, have identified, but my
> email was there to mention that we should take better care of this,
> because we do a pretty bad job at checking that the behavior of the
> compiler is deterministic.

well, that was no real option for a long time and no target till now. 
But if you say your fixes are enough for gradle, then your tests should 
ensure this stays. And then we don�t do a bad job at it anymore, do we?

> Regarding AST transformations classpath, I had actually forgotten that
> Gradle integrates directly with the compiler, so can use the 2 different
> "classpath". But AFAIK, our CLI doesn't. This should be easy to fix.

easy fix? depends... first of all... what will you gain from that? Since 
I have no idea where you get an advantage from those things outside a 
build tool that goes beyond something like make and ant, I fail to see 
the advantage

Granted, something like groovydoc gets an advantage, but honestly I do 
not see why there it should be done without sorting to achieve a 
somewhat stable output. And of course that sorting can very well happen 
in the doc tool

bye Jochen

Re: Towards a better compiler

Posted by Cédric Champeau <cc...@apache.org>.
2017-04-21 14:33 GMT+02:00 Andres Almiray <aa...@gmail.com>:

> I had a brief chat with Jochen a few days ago regarding this topic.
>
> Until now the usage of a Set or some other data structure was not really
> that important.
> If Groovy switches to a SortedSet then results may be more predictable,
> but, is it in the benefit of the majority?
> What about opening the door for certain strategies to be injected from the
> outside, thus Gradle may inject certain customizations during the compiler
> process that make sense to the build tool but not for a Groovy developer
> compiling with the default Groovy compiler settings. This would require
> looking for places where custom strategies may be required (such as the
> particular collection to keep track of names discussed earlier), perhaps
> relying on ServiceLoader or some other mechanism to discover custom
> strategies or pick the default ones during compiler bootstrap.
>

I don't see the point honestly. The fact is that the compiler, for a
specific set of sources + compile classpath, should always produce the same
output. Especially on the same machine. If it doesn't, then the compiler is
not predictable. It's a bug that needs to be fixed. It has nothing to do
with Gradle, and *all* users would benefit from this. Gradle is an
important use case because it currently defeats our caching, but it's not
the only one. IDE indexing is another example. It's a serious issue, and we
need to consider this a serious bug.

BTW, we don't need to _sort_. We need to make sure that for the same input,
we have the same output. Especially, order of interfaces in declaration
type matter (and they are reproducible today), We *must not* reorder them,
or it would change semantics (typically for traits). I have fixed the bugs
we, in the Gradle team, have identified, but my email was there to mention
that we should take better care of this, because we do a pretty bad job at
checking that the behavior of the compiler is deterministic.

Regarding AST transformations classpath, I had actually forgotten that
Gradle integrates directly with the compiler, so can use the 2 different
"classpath". But AFAIK, our CLI doesn't. This should be easy to fix.

>
> Regarding the 2nd query on referenced classes by AST. Yes, you would be
> forced to define the references classes in both classpaths.
>
> Cheers,
> Andres
>
> -------------------------------------------
> Java Champion; Groovy Enthusiast
> http://andresalmiray.com
> http://www.linkedin.com/in/aalmiray
> --
> What goes up, must come down. Ask any system administrator.
> There are 10 types of people in the world: Those who understand binary,
> and those who don't.
> To understand recursion, we must first understand recursion.
>
> On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <gr...@gmail.com>
> wrote:
>
>> Big +1 one for making the compiler reproducible.
>>
>> I think the usage of Sets and HashMap has an impact on Groovydoc too
>> because Groovydoc generates different output each time it is run.
>>
>> For example the "extends" and "implements" output in Groovydoc changes
>> the order of the classes each time it is run. This must be down to
>> internal use of unordered sets or hash maps.
>>
>> Regarding classpath (compiler vs compile classpath), in many cases AST
>> transforms reference classes from libraries that are on the "compile"
>> classpath. How would you deal with this case? Have them in both
>> places?
>>
>> Cheers
>>
>> On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <cc...@apache.org>
>> wrote:
>> > Hi team!
>> >
>> > I would like to setup some additional goals for the next release of
>> Groovy.
>> > As you may know, Gradle 3.5, released tomorrow, will ship with a build
>> cache
>> > [1]. But Groovy is causing us some troubles, because the output of
>> > compilation is not reproducible. In other words, for the same inputs,
>> we can
>> > randomly get a different output. To be clear, while the output of
>> Groovy is
>> > semantically correct, we cannot guarantee that for the same sources, the
>> > same bytecode, byte to byte, is going to be generated. This is a big
>> problem
>> > for Gradle, because it affects the cache key, and a cache miss has
>> terrible
>> > consequences: since we need to rebuild everything when a build script
>> > classpath changes, a change in the output of a build script bytecode
>> means
>> > we need to invalidate a full build...
>> >
>> > As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
>> > enough. We should revise our usage of maps and sets, and use their
>> linked
>> > counterparts when it makes sense. It, alone, cannot guarantee that we
>> have
>> > reproducible builds. In particular, cross platform. There are hundreds
>> of
>> > places where we use hash sets/maps, with objects that do not implement
>> > equals/hashcode, for example (and since those structures are mutable,
>> we're
>> > very lucky because the default hashcode uses the system one, which is
>> > immutable).
>> >
>> > Eventually, if you have read my blog post about performance of Gradle
>> 3.4
>> > [4], you would understand that the Java compiler does a much better job
>> than
>> > we do as separating things required by the compiler from things
>> required in
>> > user space. In particular, we at Groovy offer AST transformations,
>> which is
>> > similar, but not equivalent, to what annotation processors are for
>> javac.
>> > The problem is that Groovy only has a single compile classpath, which
>> > includes both the "compiler plugins" (AST transformations) and
>> > implementation dependencies of the project we compile. So we're
>> effectively
>> > mixing things that shouldn't be mixed:
>> >
>> > - annotations for AST transformations should be on the compile classpath
>> > - implementation of the AST transformations should be on the AST
>> > transformations path (compiler classpath)
>> >
>> > If we don't do this, we cannot be as smart as what we do in the Java
>> world,
>> > and compute what is relevant in terms of ABI (application binary
>> interface).
>> > So any change to the classpath, needs to be considered a breaking
>> change and
>> > we need to recompile everything. This makes the implementation of a
>> Groovy
>> > incremental compiler effectively impossible. Furthermore, it prevents
>> the
>> > AST transform implementors to use the libraries they want as
>> dependencies,
>> > without leaking them to the user compile classpath (imagine an AST xform
>> > which uses Guava 18, while the user wants Guava 17 on his classpath). In
>> > practice, the implementation dependencies of the AST xform are only
>> required
>> > at compile time, not runtime, so they should be separate.
>> >
>> > In short, I vote for the adoption of a new `-compilerpath` where we
>> would
>> > put everything that affects the compiler itself, and live `--classpath`
>> for
>> > the user space. Doing this would let tools like Gradle be much smarter
>> about
>> > what they can do.
>> >
>> > [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
>> > [2] https://issues.apache.org/jira/browse/GROOVY-8142
>> > [3] https://issues.apache.org/jira/browse/GROOVY-8148
>> > [4] https://blog.gradle.org/incremental-compiler-avoidance
>>
>>
>>
>> --
>> Graeme Rocher
>>
>
>

Re: Towards a better compiler

Posted by Andres Almiray <aa...@gmail.com>.
I had a brief chat with Jochen a few days ago regarding this topic.

Until now the usage of a Set or some other data structure was not really
that important.
If Groovy switches to a SortedSet then results may be more predictable,
but, is it in the benefit of the majority?
What about opening the door for certain strategies to be injected from the
outside, thus Gradle may inject certain customizations during the compiler
process that make sense to the build tool but not for a Groovy developer
compiling with the default Groovy compiler settings. This would require
looking for places where custom strategies may be required (such as the
particular collection to keep track of names discussed earlier), perhaps
relying on ServiceLoader or some other mechanism to discover custom
strategies or pick the default ones during compiler bootstrap.

Regarding the 2nd query on referenced classes by AST. Yes, you would be
forced to define the references classes in both classpaths.

Cheers,
Andres

-------------------------------------------
Java Champion; Groovy Enthusiast
http://andresalmiray.com
http://www.linkedin.com/in/aalmiray
--
What goes up, must come down. Ask any system administrator.
There are 10 types of people in the world: Those who understand binary, and
those who don't.
To understand recursion, we must first understand recursion.

On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <gr...@gmail.com>
wrote:

> Big +1 one for making the compiler reproducible.
>
> I think the usage of Sets and HashMap has an impact on Groovydoc too
> because Groovydoc generates different output each time it is run.
>
> For example the "extends" and "implements" output in Groovydoc changes
> the order of the classes each time it is run. This must be down to
> internal use of unordered sets or hash maps.
>
> Regarding classpath (compiler vs compile classpath), in many cases AST
> transforms reference classes from libraries that are on the "compile"
> classpath. How would you deal with this case? Have them in both
> places?
>
> Cheers
>
> On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <cc...@apache.org>
> wrote:
> > Hi team!
> >
> > I would like to setup some additional goals for the next release of
> Groovy.
> > As you may know, Gradle 3.5, released tomorrow, will ship with a build
> cache
> > [1]. But Groovy is causing us some troubles, because the output of
> > compilation is not reproducible. In other words, for the same inputs, we
> can
> > randomly get a different output. To be clear, while the output of Groovy
> is
> > semantically correct, we cannot guarantee that for the same sources, the
> > same bytecode, byte to byte, is going to be generated. This is a big
> problem
> > for Gradle, because it affects the cache key, and a cache miss has
> terrible
> > consequences: since we need to rebuild everything when a build script
> > classpath changes, a change in the output of a build script bytecode
> means
> > we need to invalidate a full build...
> >
> > As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> > enough. We should revise our usage of maps and sets, and use their linked
> > counterparts when it makes sense. It, alone, cannot guarantee that we
> have
> > reproducible builds. In particular, cross platform. There are hundreds of
> > places where we use hash sets/maps, with objects that do not implement
> > equals/hashcode, for example (and since those structures are mutable,
> we're
> > very lucky because the default hashcode uses the system one, which is
> > immutable).
> >
> > Eventually, if you have read my blog post about performance of Gradle 3.4
> > [4], you would understand that the Java compiler does a much better job
> than
> > we do as separating things required by the compiler from things required
> in
> > user space. In particular, we at Groovy offer AST transformations, which
> is
> > similar, but not equivalent, to what annotation processors are for javac.
> > The problem is that Groovy only has a single compile classpath, which
> > includes both the "compiler plugins" (AST transformations) and
> > implementation dependencies of the project we compile. So we're
> effectively
> > mixing things that shouldn't be mixed:
> >
> > - annotations for AST transformations should be on the compile classpath
> > - implementation of the AST transformations should be on the AST
> > transformations path (compiler classpath)
> >
> > If we don't do this, we cannot be as smart as what we do in the Java
> world,
> > and compute what is relevant in terms of ABI (application binary
> interface).
> > So any change to the classpath, needs to be considered a breaking change
> and
> > we need to recompile everything. This makes the implementation of a
> Groovy
> > incremental compiler effectively impossible. Furthermore, it prevents the
> > AST transform implementors to use the libraries they want as
> dependencies,
> > without leaking them to the user compile classpath (imagine an AST xform
> > which uses Guava 18, while the user wants Guava 17 on his classpath). In
> > practice, the implementation dependencies of the AST xform are only
> required
> > at compile time, not runtime, so they should be separate.
> >
> > In short, I vote for the adoption of a new `-compilerpath` where we would
> > put everything that affects the compiler itself, and live `--classpath`
> for
> > the user space. Doing this would let tools like Gradle be much smarter
> about
> > what they can do.
> >
> > [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> > [2] https://issues.apache.org/jira/browse/GROOVY-8142
> > [3] https://issues.apache.org/jira/browse/GROOVY-8148
> > [4] https://blog.gradle.org/incremental-compiler-avoidance
>
>
>
> --
> Graeme Rocher
>

Re: Towards a better compiler

Posted by Cédric Champeau <cc...@apache.org>.
2017-04-21 14:24 GMT+02:00 Graeme Rocher <gr...@gmail.com>:

> Big +1 one for making the compiler reproducible.
>
> I think the usage of Sets and HashMap has an impact on Groovydoc too
> because Groovydoc generates different output each time it is run.
>
> For example the "extends" and "implements" output in Groovydoc changes
> the order of the classes each time it is run. This must be down to
> internal use of unordered sets or hash maps.
>
> Regarding classpath (compiler vs compile classpath), in many cases AST
> transforms reference classes from libraries that are on the "compile"
> classpath. How would you deal with this case? Have them in both
> places?
>

Yes, they should. Knowing that if you do so, it implies that a potential
incremental compiler would be defeated.

>
> Cheers
>
> On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <cc...@apache.org>
> wrote:
> > Hi team!
> >
> > I would like to setup some additional goals for the next release of
> Groovy.
> > As you may know, Gradle 3.5, released tomorrow, will ship with a build
> cache
> > [1]. But Groovy is causing us some troubles, because the output of
> > compilation is not reproducible. In other words, for the same inputs, we
> can
> > randomly get a different output. To be clear, while the output of Groovy
> is
> > semantically correct, we cannot guarantee that for the same sources, the
> > same bytecode, byte to byte, is going to be generated. This is a big
> problem
> > for Gradle, because it affects the cache key, and a cache miss has
> terrible
> > consequences: since we need to rebuild everything when a build script
> > classpath changes, a change in the output of a build script bytecode
> means
> > we need to invalidate a full build...
> >
> > As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> > enough. We should revise our usage of maps and sets, and use their linked
> > counterparts when it makes sense. It, alone, cannot guarantee that we
> have
> > reproducible builds. In particular, cross platform. There are hundreds of
> > places where we use hash sets/maps, with objects that do not implement
> > equals/hashcode, for example (and since those structures are mutable,
> we're
> > very lucky because the default hashcode uses the system one, which is
> > immutable).
> >
> > Eventually, if you have read my blog post about performance of Gradle 3.4
> > [4], you would understand that the Java compiler does a much better job
> than
> > we do as separating things required by the compiler from things required
> in
> > user space. In particular, we at Groovy offer AST transformations, which
> is
> > similar, but not equivalent, to what annotation processors are for javac.
> > The problem is that Groovy only has a single compile classpath, which
> > includes both the "compiler plugins" (AST transformations) and
> > implementation dependencies of the project we compile. So we're
> effectively
> > mixing things that shouldn't be mixed:
> >
> > - annotations for AST transformations should be on the compile classpath
> > - implementation of the AST transformations should be on the AST
> > transformations path (compiler classpath)
> >
> > If we don't do this, we cannot be as smart as what we do in the Java
> world,
> > and compute what is relevant in terms of ABI (application binary
> interface).
> > So any change to the classpath, needs to be considered a breaking change
> and
> > we need to recompile everything. This makes the implementation of a
> Groovy
> > incremental compiler effectively impossible. Furthermore, it prevents the
> > AST transform implementors to use the libraries they want as
> dependencies,
> > without leaking them to the user compile classpath (imagine an AST xform
> > which uses Guava 18, while the user wants Guava 17 on his classpath). In
> > practice, the implementation dependencies of the AST xform are only
> required
> > at compile time, not runtime, so they should be separate.
> >
> > In short, I vote for the adoption of a new `-compilerpath` where we would
> > put everything that affects the compiler itself, and live `--classpath`
> for
> > the user space. Doing this would let tools like Gradle be much smarter
> about
> > what they can do.
> >
> > [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> > [2] https://issues.apache.org/jira/browse/GROOVY-8142
> > [3] https://issues.apache.org/jira/browse/GROOVY-8148
> > [4] https://blog.gradle.org/incremental-compiler-avoidance
>
>
>
> --
> Graeme Rocher
>

Re: Towards a better compiler

Posted by Graeme Rocher <gr...@gmail.com>.
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <cc...@apache.org> wrote:
> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



-- 
Graeme Rocher